Flurex animal, PFOS and SCFA data
1 INFO
This document contains the commands necessary to analyse experimental data obtained from Flurex (internal project name: R20-22). The project data contains: - Animal weight data including calculated weights per bw and normalized weight data in decimals: + body weight (bw) from day 0 to day 8, including bw gain from day 0 - 8 + liver and cecum weight from dissection on day 8
- PFOS quantitative data:
- total dosed PFOS per rat on day 4 and 8 respectively (mg)
- blood day 4 and 8 in ug PFOS/mL serum including calculations on
- total blood volume per animal based on an standard average of 64mL blood / kilogram in rats “Diehl et al. 2001”
- concentrations of PFOS on each day (ug/mL)
- total PFOS in blood volume (mg)
- total PFOS detected in blood from total dosed per day (pct)
- liver from dissection on day 8 in ug PFOS/g tissue including
calculations on
- total PFOS in liver per rat
- concentration of PFOS (ug/mL)
- total PFOS detected in liver from total dosed on day 8 (pct)
- isomer proportions of branched and linear PFOS presented as branched-linear ratio (bl-ratio)
- Short-chain fatty acids quantification of 10 compounds in colonic
water given in mM from day 8:
- acetic acid (acetate)
- formic acid (formate)
- propanoic acid (propionate)
- 2-methyl-propanoic acid (isobutyrate)
- butanoic acid (butyrate)
- 3-methyl-butanoic acid (isovalerate)
- pentanoic acid (valerate)
- 4-methyl-pentanoic acid (isocaproate)
- hexanoic acid (caproate)
- heptanoic acid (enanthate)
2 Setup
Following code loads packages, creates necessary folder and saves parameters for the following analyses.
knitr::opts_chunk$set(echo = TRUE)
# Load libraries
library(tidyverse)
library(phyloseq)
library(decontam)
library(pals)
library(ggpubr)
library(vegan)
library(phangorn)
library(kableExtra)
library(plotly)
library(rstatix)
library(forcats)
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggbreak)
library(ggrepel)
library(DAtest)
library(cowplot)
library(pheatmap)
# Create used folders if missing
if (!file.exists("R_objects")) dir.create(file.path(getwd(), "R_objects"))
if (!file.exists("plots")) dir.create(file.path(getwd(), "plots"))
if (!file.exists("plots/animal_data")) dir.create(file.path(getwd(), "plots/animal_data"))
if (!file.exists("scripts")) dir.create(file.path(getwd(), "scripts"))
# Save params
saveRDS(params, file = "R_objects/animal_params.RDS")3 LOAD DATA
Loading data from CSV-format and saves as Rdata-format.
## Error in eval(expr, envir, enclos): cannot change value of locked binding for 'params'
# Load analysis data
dat <- read.csv(params$input, header = TRUE, sep = ";", dec = ",")
save(dat, file = "R_objects/animal_data.Rdata")
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.4 ANIMAL WEIGHT DATA
Animal weight data contains data from body weight through the entire study period with calculated body weight gain, and organ weights from cecum and liver.
4.1 Body weight gain
This section will prepare to perform the data analysis for body weight gain
4.1.1 Statistics
4.1.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
dat.clean <- dat
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "bw_gain"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL bw_gain 12 17.1 2.26
## 2 PFOS bw_gain 12 17.2 4.04
## 3 VAN bw_gain 12 17.6 2.69
## 4 VAN+PFOS bw_gain 12 17.5 2.65
4.1.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R01 1 no no 310 321. 325. 339. 350 354
## 2 PFOS R30 18 yes no 258. 262. 265. 271. 270. 274
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers: sample from rat_name R01 and R30.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.951 0.0457
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.530 0.664
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that body weight gain data has two outliers, has equal variance and is normally distributed without the outliers according to Shapiro-Wilk test. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.
4.1.1.3 ANOVA One-Way test
4.1.1.3.1 Perform test
If we had equality of variance we can now run a one-way ANOVA tests
anova_test() (if we have equal variance) or a
welch_anova_test() (if variance vary).
if(EQUAL.VAR) {
res.aov <- dat.clean %>% anova_test(FORMULA)
res.aov
} else {
res.aov <- dat.clean %>% welch_anova_test(FORMULA)
res.aov
}## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 treatment 3 44 0.064 0.979 0.004
4.1.1.3.2 Perform posthoc test
A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.
if(EQUAL.VAR) {
pwc <- dat.clean %>% tukey_hsd(FORMULA)
pwc
} else {
pwc <- dat.clean %>% games_howell_test(FORMULA)
pwc
}## # A tibble: 6 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 treat… CTRL PFOS 0 0.111 -3.15 3.37 1 ns
## 2 treat… CTRL VAN 0 0.463 -2.79 3.72 0.981 ns
## 3 treat… CTRL VAN+P… 0 0.374 -2.88 3.63 0.99 ns
## 4 treat… PFOS VAN 0 0.353 -2.90 3.61 0.991 ns
## 5 treat… PFOS VAN+P… 0 0.264 -2.99 3.52 0.996 ns
## 6 treat… VAN VAN+P… 0 -0.0893 -3.35 3.17 1 ns
4.1.2 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "Bodyweight gain",limits = c(5,25),breaks = seq(5,25,5), labels = function(x) paste0(x, "%")) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
# Output plot
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage. ##
Cecum weight (grams) This section will prepare to perform the data
analysis for cecum weight data in grams
4.1.3 Statistics
4.1.3.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
dat.clean <- subset(dat, !is.na(cecum_norm))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "cecum_wt"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL cecum_wt 11 5.08 1.13
## 2 PFOS cecum_wt 12 4.70 1.23
## 3 VAN cecum_wt 12 9.41 1.27
## 4 VAN+PFOS cecum_wt 11 9.97 1.14
4.1.3.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 6 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R01 1 no no 310 321. 325. 339. 350 354
## 2 PFOS R25 13 yes no 339. 340. 353. 364. 348. 358
## 3 VAN R15 27 no yes 268. 277. 283. 290. 296. 300
## 4 VAN+PFOS R43 43 yes yes 292. 301. 300. 313. 316. 322
## 5 VAN+PFOS R44 44 yes yes 261. 269. 277 284. 287. 296
## 6 VAN+PFOS R47 47 yes yes 242. 249. 255. 263. 267. 271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains six not critical outliers.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.954 0.0655
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 42 0.0879 0.966
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that normalised cecum weight data has six non-critical outliers, is normally distribution and has equal variance. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.
4.1.3.3 ANOVA One-Way test
4.1.3.3.1 Perform test
If we had equality of variance we can now run a one-way ANOVA tests
anova_test() (if we have equal variance) or a
welch_anova_test() (if variance vary).
if(EQUAL.VAR) {
res.aov <- dat.clean %>% anova_test(cecum_wt ~ pfos*van)
res.aov
} else {
res.aov <- dat.clean %>% welch_anova_test(FORMULA)
res.aov
}## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 pfos 1 42 0.066 7.99e-01 0.002
## 2 van 1 42 185.208 5.41e-17 * 0.815
## 3 pfos:van 1 42 1.730 1.96e-01 0.040
4.1.3.3.2 Perform posthoc test
A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.
if(EQUAL.VAR) {
pwc <- dat.clean %>% tukey_hsd(FORMULA)
pwc
} else {
pwc <- dat.clean %>% games_howell_test(FORMULA)
pwc
}## # A tibble: 6 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 treatment CTRL PFOS 0 -0.374 -1.71 0.961 8.77e- 1
## 2 treatment CTRL VAN 0 4.34 3.00 5.67 3.69e-10
## 3 treatment CTRL VAN+PFOS 0 4.89 3.53 6.26 2.36e-11
## 4 treatment PFOS VAN 0 4.71 3.41 6.02 2 e-11
## 5 treatment PFOS VAN+PFOS 0 5.27 3.93 6.60 2.39e-12
## 6 treatment VAN VAN+PFOS 0 0.554 -0.780 1.89 6.85e- 1
## # ℹ 1 more variable: p.adj.signif <chr>
4.1.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "grams",limits = c(0,15),breaks = seq(0,15,5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(12,14,13,15))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.4.2 Cecum weight (normalized)
This section will prepare to perform the data analysis for normalized cecum weight data
4.2.1 Statistics
4.2.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
dat.clean <- subset(dat, !is.na(cecum_norm))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "cecum_norm"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL cecum_norm 11 1 0.148
## 2 PFOS cecum_norm 12 0.944 0.164
## 3 VAN cecum_norm 12 1.88 0.24
## 4 VAN+PFOS cecum_norm 11 2.07 0.201
4.2.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN R15 27 no yes 268. 277. 283. 290. 296. 300
## 2 VAN R24 36 no yes 281. 286. 294. 305. 309. 312
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.984 0.753
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 42 0.416 0.742
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that normalised cecum weight data has two outliers, is normally distribution and has equal variance. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.
4.2.1.3 ANOVA One-Way test
4.2.1.3.1 Perform test
If we had equality of variance we can now run a one-way ANOVA tests
anova_test() (if we have equal variance) or a
welch_anova_test() (if variance vary).
if(EQUAL.VAR) {
res.aov <- dat.clean %>% anova_test(FORMULA)
res.aov
} else {
res.aov <- dat.clean %>% welch_anova_test(FORMULA)
res.aov
}## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 treatment 3 42 106.226 1.21e-19 * 0.884
4.2.1.3.2 Perform posthoc test
A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.
if(EQUAL.VAR) {
pwc <- dat.clean %>% tukey_hsd(FORMULA)
pwc
} else {
pwc <- dat.clean %>% games_howell_test(FORMULA)
pwc
}## # A tibble: 6 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 treatment CTRL PFOS 0 -0.0562 -0.271 0.158 8.96e- 1
## 2 treatment CTRL VAN 0 0.884 0.669 1.10 1.42e-12
## 3 treatment CTRL VAN+PFOS 0 1.07 0.850 1.29 1.06e-12
## 4 treatment PFOS VAN 0 0.940 0.730 1.15 1.09e-12
## 5 treatment PFOS VAN+PFOS 0 1.13 0.911 1.34 1.06e-12
## 6 treatment VAN VAN+PFOS 0 0.186 -0.0287 0.400 1.1 e- 1
## # ℹ 1 more variable: p.adj.signif <chr>
4.2.2 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "% difference",limits = c(0.5,3.1),breaks = seq(0.5,3.1,0.5), labels = function(x) paste0(x*100, "%")) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(2.2,2.8,2.5,3.1))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.4.3 Liver weight (grams)
This section will prepare to perform the data analysis for liver weight data in grams
4.3.1 Statistics
4.3.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(liver_norm))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "liver_wt"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL liver_wt 12 11.2 1.12
## 2 PFOS liver_wt 12 12.0 1.51
## 3 VAN liver_wt 12 10.5 0.973
## 4 VAN+PFOS liver_wt 12 11.3 1.47
4.3.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 1 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R01 1 no no 310 321. 325. 339. 350 354
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains one non-critical outlier.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.989 0.922
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 1.50 0.227
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that normalised liver weight data has one non-critical outlier, is normally distribution and has equal variance. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.
4.3.1.3 ANOVA One-Way test
4.3.1.3.1 Perform test
If we had equality of variance we can now run a one-way ANOVA tests
anova_test() (if we have equal variance) or a
welch_anova_test() (if variance vary).
if(EQUAL.VAR) {
res.aov <- dat.clean %>% anova_test(liver_wt ~ pfos*van) #FORMULA
res.aov
} else {
res.aov <- dat.clean %>% welch_anova_test(FORMULA)
res.aov
}## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 pfos 1 44 4.977000 0.031 * 1.02e-01
## 2 van 1 44 3.492000 0.068 7.40e-02
## 3 pfos:van 1 44 0.000905 0.976 2.06e-05
4.3.1.3.2 Perform posthoc test
A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.
if(EQUAL.VAR) {
pwc <- dat.clean %>% tukey_hsd(FORMULA) #FORMULA
pwc
} else {
pwc <- dat.clean %>% games_howell_test(FORMULA)
pwc
}## # A tibble: 6 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 trea… CTRL PFOS 0 0.820 -0.587 2.23 0.414 ns
## 2 trea… CTRL VAN 0 -0.708 -2.11 0.699 0.541 ns
## 3 trea… CTRL VAN+P… 0 0.135 -1.27 1.54 0.994 ns
## 4 trea… PFOS VAN 0 -1.53 -2.93 -0.121 0.0287 *
## 5 trea… PFOS VAN+P… 0 -0.685 -2.09 0.722 0.568 ns
## 6 trea… VAN VAN+P… 0 0.843 -0.565 2.25 0.39 ns
4.3.1.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "µg/g") + #,limits = c(8,17),breaks = seq(8,17,2)
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE) #, y.position = c(1.35,1.4,1.45,1.5)
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.4.4 Liver weight (normalized)
This section will prepare to perform the data analysis for normalized liver weight data
4.4.1 Statistics
4.4.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(liver_norm))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "liver_norm"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL liver_norm 12 1 0.055
## 2 PFOS liver_norm 12 1.08 0.055
## 3 VAN liver_norm 12 0.934 0.044
## 4 VAN+PFOS liver_norm 12 1.03 0.065
4.4.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## [1] treatment rat_name ordering pfos
## [5] van bw_0 bw_1 bw_2
## [9] bw_3 bw_4 bw_5 bw_6
## [13] bw_7 bw_8 bw_gain cecum_wt
## [17] cecum_wt_bw cecum_norm liver_wt liver_wt_bw
## [21] liver_norm tot_pfos4 blood_vol4_mL pfos_serum4_ugml
## [25] pfos_serum4_ug pfos_serum4_mg pfos_serum4_pct tot_pfos8
## [29] blood_vol8_mL pfos_serum8_ugml pfos_serum8_ug pfos_serum8_mg
## [33] pfos_serum8_pct pfos_change48_pct pfos_liver_ugg pfos_liver_mg
## [37] pfos_liver_pct acetic formic propanoic
## [41] m2_propanoic butanoic m3_butanoic pentanoic
## [45] m4_pentanoic hexanoic heptanoic is.outlier
## [49] is.extreme
## <0 rækker> (eller 0-længde row.names)
Data contains zero outliers.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.985 0.778
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.430 0.733
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that normalised liver weight data has two outliers, is normally distribution and has equal variance. Therefore we use a one-way ANOVA test with Tukey’s honest significance test.
4.4.1.3 ANOVA One-Way test
4.4.1.3.1 Perform test
If we had equality of variance we can now run a one-way ANOVA tests
anova_test() (if we have equal variance) or a
welch_anova_test() (if variance vary).
if(EQUAL.VAR) {
res.aov <- dat.clean %>% anova_test(FORMULA)
res.aov
} else {
res.aov <- dat.clean %>% welch_anova_test(FORMULA)
res.aov
}## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 treatment 3 44 15.909 3.79e-07 * 0.52
4.4.1.3.2 Perform posthoc test
A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.
if(EQUAL.VAR) {
pwc <- dat.clean %>% tukey_hsd(FORMULA)
pwc
} else {
pwc <- dat.clean %>% games_howell_test(FORMULA)
pwc
}## # A tibble: 6 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 treatment CTRL PFOS 0 0.0846 0.0246 0.145 0.0027
## 2 treatment CTRL VAN 0 -0.0663 -0.126 -0.00623 0.0254
## 3 treatment CTRL VAN+PFOS 0 0.0352 -0.0249 0.0953 0.409
## 4 treatment PFOS VAN 0 -0.151 -0.211 -0.0909 0.000000182
## 5 treatment PFOS VAN+PFOS 0 -0.0494 -0.110 0.0107 0.14
## 6 treatment VAN VAN+PFOS 0 0.102 0.0415 0.162 0.000269
## # ℹ 1 more variable: p.adj.signif <chr>
4.4.1.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "% difference",limits = c(0.75,1.5),breaks = seq(0.75,1.5,0.25), labels = function(x) paste0(x*100, "%")) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(1.35,1.4,1.45,1.5))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/weight/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5 PFOS QUANTITATIVE DATA
Following section handles data analysis of PFOS from serum and liver samples (Run on Dionex Ultimate 3000 / Bruker EVOQ Elite UPLC-MS/MS against linear PPOS standard curve and with internal MPFOS standard).
5.1 Blood serum day 4
This section will prepare to perform the data analysis for PFOS data from serum on day 4.
5.1.1 ug/mL in serum
5.1.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_serum4_ugml))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_serum4_ugml"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL pfos_serum4_ugml 11 0 0.001
## 2 PFOS pfos_serum4_ugml 12 9.17 2.01
## 3 VAN pfos_serum4_ugml 11 0.001 0.001
## 4 VAN+PFOS pfos_serum4_ugml 10 10.0 1.18
5.1.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 5 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R06 6 no no 274. 276. 285. 296. 297. 302
## 2 CTRL R11 11 no no 271. 278. 287. 294. 293. 299
## 3 PFOS R29 17 yes no 239. 243. 248. 255. 259. 267
## 4 VAN R13 25 no yes 218. 222. 228. 234 231. 241
## 5 VAN R21 33 no yes 262. 268. 274. 285. 281. 291
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers: sample from rat_name R01 and R30.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.807 0.00000412
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 40 7.48 0.000436
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that body weight gain data has two outliers and has equal variance, however falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
5.1.1.3 Kruskal-Wallis test
5.1.1.3.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 pfos_serum4_ugml 44 35.4 3 0.000000101 Kruskal-Wallis
5.1.1.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 pfos_serum4_ugml 44 0.809 eta2[H] large
5.1.1.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 pfos_serum4_… CTRL PFOS 11 12 3.86 1.15e-4 1.85e-4 ***
## 2 pfos_serum4_… CTRL VAN 11 11 0.0172 9.86e-1 9.86e-1 ns
## 3 pfos_serum4_… CTRL VAN+P… 11 10 4.53 5.87e-6 1.91e-5 ****
## 4 pfos_serum4_… PFOS VAN 12 11 -3.84 1.23e-4 1.85e-4 ***
## 5 pfos_serum4_… PFOS VAN+P… 12 10 0.863 3.88e-1 4.66e-1 ns
## 6 pfos_serum4_… VAN VAN+P… 11 10 4.51 6.36e-6 1.91e-5 ****
5.1.1.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "ug/mL",limits = c(0,20),breaks = seq(0,20,5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(14,17,15,14))
p## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 16 rows containing missing values (`geom_point()`).
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
## Removed 16 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.1.2 Total mg in serum
5.1.2.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum4_mg"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum4_mg))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.1.2.2 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
## [1] treatment rat_name ordering pfos
## [5] van bw_0 bw_1 bw_2
## [9] bw_3 bw_4 bw_5 bw_6
## [13] bw_7 bw_8 bw_gain cecum_wt
## [17] cecum_wt_bw cecum_norm liver_wt liver_wt_bw
## [21] liver_norm tot_pfos4 blood_vol4_mL pfos_serum4_ugml
## [25] pfos_serum4_ug pfos_serum4_mg pfos_serum4_pct tot_pfos8
## [29] blood_vol8_mL pfos_serum8_ugml pfos_serum8_ug pfos_serum8_mg
## [33] pfos_serum8_pct pfos_change48_pct pfos_liver_ugg pfos_liver_mg
## [37] pfos_liver_pct acetic formic propanoic
## [41] m2_propanoic butanoic m3_butanoic pentanoic
## [45] m4_pentanoic hexanoic heptanoic is.outlier
## [49] is.extreme
## <0 rækker> (eller 0-længde row.names)
Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_serum4_mg 0.955 0.717
## 2 VAN+PFOS pfos_serum4_mg 0.930 0.446
If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 20 0.0274 0.870
If the p-value of the Levene’s test is significant, it suggests that
there is a significant difference between the variances of the two
groups. In such case we should use Welch t-test, which doesn’t assume
the equality of the two variances (var.equal=FALSE). If the
Levene’s test is non-significant we can perform a Student t-test
(var.equal=TRUE).
No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.
5.1.2.3 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -0.0171 0.165 0.183 pfos_s… PFOS VAN+P… 12 10 -1.23 0.232
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
The output provides:
.y.: the y variable used in the test.group1,group2: the compared groups in the pairwise tests.statistic: Test statistic used to compute the p-value.df: degrees of freedom.p: p-value.p.adj: the adjusted p-value.method: the statistical test used to compare groups.p.signif, p.adj.signif: the significance level of p-values and adjusted p-values, respectively.estimate: estimate of the effect size. It corresponds to the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.estimate1, estimate2: show the mean values of the two groups, respectively, for independent samples t-tests.alternative: a character string describing the alternative hypothesis.conf.low,conf.high: Lower and upper bound on a confidence interval.
Effect size
The effect size is calculated as Cohen’s D
## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_serum4_mg PFOS VAN+PFOS -0.528 12 10 moderate
5.1.2.4 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mg PFOS",limits = c(0,0.30),breaks = seq(0,0.30,0.1)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(0.28))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.1.3 Pct.
Data for PFOS levels in serum at day 4 calculated from the total PFOS dosed at the time point. #### Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum4_pct"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum4_pct))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.1.3.1 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
## # A tibble: 1 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS R29 17 yes no 239. 243. 248. 255. 259. 267
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_serum4_pct 0.937 0.456
## 2 VAN+PFOS pfos_serum4_pct 0.939 0.547
If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 20 1.43 0.246
If the p-value of the Levene’s test is significant, it suggests that
there is a significant difference between the variances of the two
groups. In such case we should use Welch t-test, which doesn’t assume
the equality of the two variances (var.equal=FALSE). If the
Levene’s test is non-significant we can perform a Student t-test
(var.equal=TRUE).
5.1.3.2 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -0.626 6.75 7.37 pfos_s… PFOS VAN+P… 12 10 -1.17 0.254
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
The output provides:
.y.: the y variable used in the test.group1,group2: the compared groups in the pairwise tests.statistic: Test statistic used to compute the p-value.df: degrees of freedom.p: p-value.p.adj: the adjusted p-value.method: the statistical test used to compare groups.p.signif, p.adj.signif: the significance level of p-values and adjusted p-values, respectively.estimate: estimate of the effect size. It corresponds to the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.estimate1, estimate2: show the mean values of the two groups, respectively, for independent samples t-tests.alternative: a character string describing the alternative hypothesis.conf.low,conf.high: Lower and upper bound on a confidence interval.
Effect size
The effect size is calculated as Cohen’s D
## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_serum4_pct PFOS VAN+PFOS -0.503 12 10 moderate
5.1.3.3 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "% of total dosed PFOS", limits = c(3,10),breaks = seq(3,10,1)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE) #, y.position = c(1.35,1.4,1.45,1.5))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.2 Blood serum day 8
This section will prepare to perform the data analysis for PFOS data from serum on day 8.
5.2.1 ug/µL in serum
5.2.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_serum8_ugml))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_serum8_ugml"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL pfos_serum8_ugml 12 0.016 0.03
## 2 PFOS pfos_serum8_ugml 12 36.3 15.5
## 3 VAN pfos_serum8_ugml 12 0.011 0.021
## 4 VAN+PFOS pfos_serum8_ugml 12 32.2 10.7
5.2.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 7 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R05 5 no no 214 219. 222. 229. 231. 237
## 2 CTRL R09 9 no no 273. 278. 290. 290. 296. 302
## 3 CTRL R12 12 no no 297. 304. 316. 322. 324. 334
## 4 VAN R14 26 no yes 246. 256. 260. 267. 270. 274
## 5 VAN R16 28 no yes 256. 260 268. 275. 273 279
## 6 VAN R24 36 no yes 281. 286. 294. 305. 309. 312
## 7 VAN+PFOS R47 47 yes yes 242. 249. 255. 263. 267. 271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers: sample from rat_name R01 and R30.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.823 0.00000461
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 8.92 0.0000993
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that body weight gain data has two outliers and has equal variance, however falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
5.2.1.3 Kruskal-Wallis test
5.2.1.3.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 pfos_serum8_ugml 48 37.3 3 0.0000000402 Kruskal-Wallis
5.2.1.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 pfos_serum8_ugml 48 0.779 eta2[H] large
5.2.1.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 pfos_serum8_… CTRL PFOS 12 12 4.37 1.26e-5 3.78e-5 ****
## 2 pfos_serum8_… CTRL VAN 12 12 -0.0899 9.28e-1 9.28e-1 ns
## 3 pfos_serum8_… CTRL VAN+P… 12 12 4.17 3.02e-5 4.53e-5 ****
## 4 pfos_serum8_… PFOS VAN 12 12 -4.46 8.32e-6 3.78e-5 ****
## 5 pfos_serum8_… PFOS VAN+P… 12 12 -0.195 8.46e-1 9.28e-1 ns
## 6 pfos_serum8_… VAN VAN+P… 12 12 4.26 2.03e-5 4.05e-5 ****
5.2.1.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "ug/mL",limits = c(0,80),breaks = seq(0,80,10)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(72,80,75,72))
p## Warning: Removed 11 rows containing missing values (`geom_point()`).
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 11 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 11 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.2.2 Total mg in serum
5.2.2.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum8_mg"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum8_mg))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.2.2.2 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
## [1] treatment rat_name ordering pfos
## [5] van bw_0 bw_1 bw_2
## [9] bw_3 bw_4 bw_5 bw_6
## [13] bw_7 bw_8 bw_gain cecum_wt
## [17] cecum_wt_bw cecum_norm liver_wt liver_wt_bw
## [21] liver_norm tot_pfos4 blood_vol4_mL pfos_serum4_ugml
## [25] pfos_serum4_ug pfos_serum4_mg pfos_serum4_pct tot_pfos8
## [29] blood_vol8_mL pfos_serum8_ugml pfos_serum8_ug pfos_serum8_mg
## [33] pfos_serum8_pct pfos_change48_pct pfos_liver_ugg pfos_liver_mg
## [37] pfos_liver_pct acetic formic propanoic
## [41] m2_propanoic butanoic m3_butanoic pentanoic
## [45] m4_pentanoic hexanoic heptanoic is.outlier
## [49] is.extreme
## <0 rækker> (eller 0-længde row.names)
Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_serum8_mg 0.890 0.119
## 2 VAN+PFOS pfos_serum8_mg 0.902 0.168
If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 22 1.98 0.173
If the p-value of the Levene’s test is significant, it suggests that
there is a significant difference between the variances of the two
groups. In such case we should use Welch t-test, which doesn’t assume
the equality of the two variances (var.equal=FALSE). If the
Levene’s test is non-significant we can perform a Student t-test
(var.equal=TRUE).
No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.
5.2.2.3 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 0.0931 0.717 0.624 pfos_s… PFOS VAN+P… 12 12 0.853 0.403
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
Effect size
The effect size is calculated as Cohen’s D
## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_serum8_mg PFOS VAN+PFOS 0.348 12 12 small
5.2.2.4 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mg PFOS",limits = c(0,2),breaks = seq(0,2,0.5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(1.75))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.2.3 Pct.
5.2.3.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_serum8_pct"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes" & !rat_name == "R47")
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_serum8_pct))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.2.3.2 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
## [1] treatment rat_name ordering pfos
## [5] van bw_0 bw_1 bw_2
## [9] bw_3 bw_4 bw_5 bw_6
## [13] bw_7 bw_8 bw_gain cecum_wt
## [17] cecum_wt_bw cecum_norm liver_wt liver_wt_bw
## [21] liver_norm tot_pfos4 blood_vol4_mL pfos_serum4_ugml
## [25] pfos_serum4_ug pfos_serum4_mg pfos_serum4_pct tot_pfos8
## [29] blood_vol8_mL pfos_serum8_ugml pfos_serum8_ug pfos_serum8_mg
## [33] pfos_serum8_pct pfos_change48_pct pfos_liver_ugg pfos_liver_mg
## [37] pfos_liver_pct acetic formic propanoic
## [41] m2_propanoic butanoic m3_butanoic pentanoic
## [45] m4_pentanoic hexanoic heptanoic is.outlier
## [49] is.extreme
## <0 rækker> (eller 0-længde row.names)
Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_serum8_pct 0.867 0.0594
## 2 VAN+PFOS pfos_serum8_pct 0.899 0.181
If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 21 2.86 0.106
If the p-value of the Levene’s test is significant, it suggests that
there is a significant difference between the variances of the two
groups. In such case we should use Welch t-test, which doesn’t assume
the equality of the two variances (var.equal=FALSE). If the
Levene’s test is non-significant we can perform a Student t-test
(var.equal=TRUE).
No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.
5.2.3.3 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 2.05 11.8 9.74 pfos_s… PFOS VAN+P… 12 11 1.29 0.212
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
Effect size
The effect size is calculated as Cohen’s D
## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_serum8_pct PFOS VAN+PFOS 0.537 12 11 moderate
5.2.3.4 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "% of total dosed PFOS", limits = c(5,25),breaks = seq(5,25,5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(24))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.3 Blood serum day 4 and 8
This section will prepare to perform the data analysis for PFOS data from serum on day 4 and 8 collected.
5.3.1 Change from day 4 to 8 (Pct.)
5.3.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_change48_pct"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes") # add following to subset() to remove the outliers: & !rat_name %in% c("R47","R27"))
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_change48_pct))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.3.1.2 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS R27 15 yes no 270. 280. 284. 291. 290. 296
## 2 VAN+PFOS R47 47 yes yes 242. 249. 255. 263. 267. 271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Any extreme outliers can be bad samples or errors in data entry. If outliers, compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_change48_pct 0.894 0.132
## 2 VAN+PFOS pfos_change48_pct 0.805 0.0167
If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 20 0.639 0.434
Two outliers were identified (sample for R27 and R47). Analysis result and test method is similar with and without outliers. Data is normally distributed and has equal variance. Hence we use t-test.
5.3.1.3 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 85.1 341. 256. pfos_c… PFOS VAN+P… 12 10 1.11 0.281
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
Effect size
The effect size is calculated as Cohen’s D
## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_change48_pct PFOS VAN+PFOS 0.475 12 10 small
5.3.1.4 Conclusion
5.3.1.5 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create point plot with mean and SD
data_summary <- function(x) {
m <- mean(x)
ymin <- m-sd(x)
ymax <- m+sd(x)
return(c(y=m,ymin=ymin,ymax=ymax))
}
data_summary_collapsed <- function(x) {
m <- mean(x)
ymin <- m
ymax <- m
return(c(y=m,ymin=ymin,ymax=ymax))
}
p <- ggplot(dat.clean, aes(x = .data[[PREDICTOR]], y = .data[[OUTCOME]], color = .data[[PREDICTOR]])) +
stat_summary(fun.data = data_summary_collapsed, geom = "crossbar", color = "black", width = 0.5, linewidth = 0.3) +
stat_summary(fun.data = data_summary, geom = "errorbar", color = "black", width = 0.15, linewidth = 0.5) +
geom_point(position = position_jitterdodge(dodge.width = 0.6, jitter.width = 0.4), size = 2, colour = "black", shape = 21, stroke = 0.5, aes(fill = treatment)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "% change", limits = c(100,900),breaks = seq(100,900,100), labels = function(x) paste0(x, "%")) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme_pubr()
p# Alternative: Create boxplot
# p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
# fill = PREDICTOR,
# add = "jitter",
# add.params = list(size = 1)) +
# scale_fill_manual(values = params$COL) +
# scale_y_continuous(name = "% change", limits = c(100,900),breaks = seq(100,900,100)) +
# labs(fill = "Treatment") +
# scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE) #, y.position = c(1.35,1.4,1.45,1.5))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 70, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.3.2 Data ug/mL
5.3.2.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Color scheme
COL <- c("#61d46b","#ffe900","#31b44b","#efc000")
# Subset data
dat.sub <- subset(dat, pfos == "yes")
# Create data frame for data representation
dat.clean <- dat.sub %>% select(rat_name, treatment, pfos_serum4_ugml, pfos_serum8_ugml) %>%
pivot_longer(., cols = c(pfos_serum4_ugml, pfos_serum8_ugml), names_to = "data_group", values_to = "conc")
# Create column for day of sampling
dat.clean <- transform(dat.clean, "day" = ifelse(dat.clean$data_group == "pfos_serum8_ugml","d8","d4"))
# Create ID column for easier handling
for (i in dat.sub$rat_name) {
dat.clean$ID <- paste(dat.clean$day,"_",dat.clean$treatment)
}
# Order dataframe for analysis
dat.clean <- dat.clean[order(dat.clean$day),]
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(conc))
dat.clean## rat_name treatment data_group conc day ID
## 1 R25 PFOS pfos_serum4_ugml 8.56 d4 d4 _ PFOS
## 3 R26 PFOS pfos_serum4_ugml 9.60 d4 d4 _ PFOS
## 5 R27 PFOS pfos_serum4_ugml 7.92 d4 d4 _ PFOS
## 7 R28 PFOS pfos_serum4_ugml 8.64 d4 d4 _ PFOS
## 9 R29 PFOS pfos_serum4_ugml 12.96 d4 d4 _ PFOS
## 11 R30 PFOS pfos_serum4_ugml 8.68 d4 d4 _ PFOS
## 13 R31 PFOS pfos_serum4_ugml 5.72 d4 d4 _ PFOS
## 15 R32 PFOS pfos_serum4_ugml 7.56 d4 d4 _ PFOS
## 17 R33 PFOS pfos_serum4_ugml 8.24 d4 d4 _ PFOS
## 19 R34 PFOS pfos_serum4_ugml 11.36 d4 d4 _ PFOS
## 21 R35 PFOS pfos_serum4_ugml 9.00 d4 d4 _ PFOS
## 23 R36 PFOS pfos_serum4_ugml 11.84 d4 d4 _ PFOS
## 25 R37 VAN+PFOS pfos_serum4_ugml 9.36 d4 d4 _ VAN+PFOS
## 27 R38 VAN+PFOS pfos_serum4_ugml 11.76 d4 d4 _ VAN+PFOS
## 29 R39 VAN+PFOS pfos_serum4_ugml 10.64 d4 d4 _ VAN+PFOS
## 31 R40 VAN+PFOS pfos_serum4_ugml 12.12 d4 d4 _ VAN+PFOS
## 33 R41 VAN+PFOS pfos_serum4_ugml 9.12 d4 d4 _ VAN+PFOS
## 35 R42 VAN+PFOS pfos_serum4_ugml 9.80 d4 d4 _ VAN+PFOS
## 37 R43 VAN+PFOS pfos_serum4_ugml 10.08 d4 d4 _ VAN+PFOS
## 43 R46 VAN+PFOS pfos_serum4_ugml 9.88 d4 d4 _ VAN+PFOS
## 45 R47 VAN+PFOS pfos_serum4_ugml 8.80 d4 d4 _ VAN+PFOS
## 47 R48 VAN+PFOS pfos_serum4_ugml 8.60 d4 d4 _ VAN+PFOS
## 2 R25 PFOS pfos_serum8_ugml 43.92 d8 d8 _ PFOS
## 4 R26 PFOS pfos_serum8_ugml 45.60 d8 d8 _ PFOS
## 6 R27 PFOS pfos_serum8_ugml 69.92 d8 d8 _ PFOS
## 8 R28 PFOS pfos_serum8_ugml 21.92 d8 d8 _ PFOS
## 10 R29 PFOS pfos_serum8_ugml 26.56 d8 d8 _ PFOS
## 12 R30 PFOS pfos_serum8_ugml 47.84 d8 d8 _ PFOS
## 14 R31 PFOS pfos_serum8_ugml 21.84 d8 d8 _ PFOS
## 16 R32 PFOS pfos_serum8_ugml 29.36 d8 d8 _ PFOS
## 18 R33 PFOS pfos_serum8_ugml 22.08 d8 d8 _ PFOS
## 20 R34 PFOS pfos_serum8_ugml 52.48 d8 d8 _ PFOS
## 22 R35 PFOS pfos_serum8_ugml 29.84 d8 d8 _ PFOS
## 24 R36 PFOS pfos_serum8_ugml 23.76 d8 d8 _ PFOS
## 26 R37 VAN+PFOS pfos_serum8_ugml 36.08 d8 d8 _ VAN+PFOS
## 28 R38 VAN+PFOS pfos_serum8_ugml 31.28 d8 d8 _ VAN+PFOS
## 30 R39 VAN+PFOS pfos_serum8_ugml 25.92 d8 d8 _ VAN+PFOS
## 32 R40 VAN+PFOS pfos_serum8_ugml 23.92 d8 d8 _ VAN+PFOS
## 34 R41 VAN+PFOS pfos_serum8_ugml 21.68 d8 d8 _ VAN+PFOS
## 36 R42 VAN+PFOS pfos_serum8_ugml 40.96 d8 d8 _ VAN+PFOS
## 38 R43 VAN+PFOS pfos_serum8_ugml 25.60 d8 d8 _ VAN+PFOS
## 40 R44 VAN+PFOS pfos_serum8_ugml 37.44 d8 d8 _ VAN+PFOS
## 42 R45 VAN+PFOS pfos_serum8_ugml 25.36 d8 d8 _ VAN+PFOS
## 44 R46 VAN+PFOS pfos_serum8_ugml 34.72 d8 d8 _ VAN+PFOS
## 46 R47 VAN+PFOS pfos_serum8_ugml 59.52 d8 d8 _ VAN+PFOS
## 48 R48 VAN+PFOS pfos_serum8_ugml 23.76 d8 d8 _ VAN+PFOS
# Set names of variables
PREDICTOR <- "ID"
OUTCOME <- "conc"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## ID variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 d4 _ PFOS conc 12 9.17 2.01
## 2 d4 _ VAN+PFOS conc 10 10.0 1.18
## 3 d8 _ PFOS conc 12 36.3 15.5
## 4 d8 _ VAN+PFOS conc 12 32.2 10.7
5.3.2.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = COL)
bxp5.3.2.3 Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 8
## ID rat_name treatment data_group conc day is.outlier is.extreme
## <chr> <chr> <chr> <chr> <dbl> <chr> <lgl> <lgl>
## 1 d4 _ PFOS R29 PFOS pfos_serum… 13.0 d4 TRUE FALSE
## 2 d8 _ VAN+PFOS R47 VAN+PFOS pfos_serum… 59.5 d8 TRUE FALSE
Data contains two outliers: sample from rat_name R01 and R30.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.875 0.000151
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 42 6.46 0.00108
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that PFOS concentrations from day 4 and 8 has two outliers, has unequal variance, and falls short on the Shapiro-Wilk test of normality (not normally distributed). Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
5.3.2.4 Kruskal-Wallis test
5.3.2.4.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 conc 46 34.4 3 0.000000165 Kruskal-Wallis
5.3.2.4.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 conc 46 0.747 eta2[H] large
5.3.2.4.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 conc d4 _ PFOS d4 _ V… 12 10 0.798 4.25e-1 5.10e-1 ns
## 2 conc d4 _ PFOS d8 _ P… 12 12 4.68 2.92e-6 1.75e-5 ****
## 3 conc d4 _ PFOS d8 _ V… 12 12 4.48 7.51e-6 2.25e-5 ****
## 4 conc d4 _ VAN+PFOS d8 _ P… 10 12 3.66 2.51e-4 5.02e-4 ***
## 5 conc d4 _ VAN+PFOS d8 _ V… 10 12 3.47 5.15e-4 7.73e-4 ***
## 6 conc d8 _ PFOS d8 _ V… 12 12 -0.198 8.43e-1 8.43e-1 ns
5.3.2.5 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = COL,labels = c("PFOS day 4","VAN+PFOS day 4","PFOS day 8","VAN+PFOS day 8")) +
scale_y_continuous(name = "ug/mL",limits = c(0,85),breaks = seq(0,85,10)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment", labels = c("PFOS\nDay 4","VAN+PFOS\nDay 4","PFOS\nDay 8","VAN+PFOS\nDay 8")) +
theme(axis.title.x = element_blank())
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = FALSE, y.position = c(75,85,70,80))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.3.3 Data mg
5.3.3.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Color scheme
COL <- c("#61d46b","#ffe900","#31b44b","#efc000")
# Subset data
dat.sub <- subset(dat, pfos == "yes")
# Create data frame for data representation
dat.clean <- dat.sub %>% select(rat_name, treatment, pfos_serum4_mg, pfos_serum8_mg) %>%
pivot_longer(., cols = c(pfos_serum4_mg, pfos_serum8_mg), names_to = "data_group", values_to = "mg")
# Create column for day of sampling
dat.clean <- transform(dat.clean, "day" = ifelse(dat.clean$data_group == "pfos_serum8_mg","d8","d4"))
# Create ID column for easier handling
for (i in dat.sub$rat_name) {
dat.clean$ID <- paste(dat.clean$day,"_",dat.clean$treatment)
}
# Order dataframe for analysis
dat.clean <- dat.clean[order(dat.clean$day),]
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(mg))
dat.clean## rat_name treatment data_group mg day ID
## 1 R25 PFOS pfos_serum4_mg 0.1904292 d4 d4 _ PFOS
## 3 R26 PFOS pfos_serum4_mg 0.1592525 d4 d4 _ PFOS
## 5 R27 PFOS pfos_serum4_mg 0.1472486 d4 d4 _ PFOS
## 7 R28 PFOS pfos_serum4_mg 0.1480827 d4 d4 _ PFOS
## 9 R29 PFOS pfos_serum4_mg 0.2149079 d4 d4 _ PFOS
## 11 R30 PFOS pfos_serum4_mg 0.1498237 d4 d4 _ PFOS
## 13 R31 PFOS pfos_serum4_mg 0.1123500 d4 d4 _ PFOS
## 15 R32 PFOS pfos_serum4_mg 0.1359590 d4 d4 _ PFOS
## 17 R33 PFOS pfos_serum4_mg 0.1546747 d4 d4 _ PFOS
## 19 R34 PFOS pfos_serum4_mg 0.2121503 d4 d4 _ PFOS
## 21 R35 PFOS pfos_serum4_mg 0.1792512 d4 d4 _ PFOS
## 23 R36 PFOS pfos_serum4_mg 0.1811804 d4 d4 _ PFOS
## 25 R37 VAN+PFOS pfos_serum4_mg 0.1927112 d4 d4 _ VAN+PFOS
## 27 R38 VAN+PFOS pfos_serum4_mg 0.2344474 d4 d4 _ VAN+PFOS
## 29 R39 VAN+PFOS pfos_serum4_mg 0.1721467 d4 d4 _ VAN+PFOS
## 31 R40 VAN+PFOS pfos_serum4_mg 0.2381338 d4 d4 _ VAN+PFOS
## 33 R41 VAN+PFOS pfos_serum4_mg 0.1657651 d4 d4 _ VAN+PFOS
## 35 R42 VAN+PFOS pfos_serum4_mg 0.1652672 d4 d4 _ VAN+PFOS
## 37 R43 VAN+PFOS pfos_serum4_mg 0.2037289 d4 d4 _ VAN+PFOS
## 43 R46 VAN+PFOS pfos_serum4_mg 0.1702838 d4 d4 _ VAN+PFOS
## 45 R47 VAN+PFOS pfos_serum4_mg 0.1501491 d4 d4 _ VAN+PFOS
## 47 R48 VAN+PFOS pfos_serum4_mg 0.1332518 d4 d4 _ VAN+PFOS
## 2 R25 PFOS pfos_serum8_mg 1.0930000 d8 d8 _ PFOS
## 4 R26 PFOS pfos_serum8_mg 0.8230000 d8 d8 _ PFOS
## 6 R27 PFOS pfos_serum8_mg 1.3690000 d8 d8 _ PFOS
## 8 R28 PFOS pfos_serum8_mg 0.3980000 d8 d8 _ PFOS
## 10 R29 PFOS pfos_serum8_mg 0.4810000 d8 d8 _ PFOS
## 12 R30 PFOS pfos_serum8_mg 0.8450000 d8 d8 _ PFOS
## 14 R31 PFOS pfos_serum8_mg 0.4720000 d8 d8 _ PFOS
## 16 R32 PFOS pfos_serum8_mg 0.5750000 d8 d8 _ PFOS
## 18 R33 PFOS pfos_serum8_mg 0.4550000 d8 d8 _ PFOS
## 20 R34 PFOS pfos_serum8_mg 1.0550000 d8 d8 _ PFOS
## 22 R35 PFOS pfos_serum8_mg 0.6510000 d8 d8 _ PFOS
## 24 R36 PFOS pfos_serum8_mg 0.3890000 d8 d8 _ PFOS
## 26 R37 VAN+PFOS pfos_serum8_mg 0.7850000 d8 d8 _ VAN+PFOS
## 28 R38 VAN+PFOS pfos_serum8_mg 0.6670000 d8 d8 _ VAN+PFOS
## 30 R39 VAN+PFOS pfos_serum8_mg 0.4530000 d8 d8 _ VAN+PFOS
## 32 R40 VAN+PFOS pfos_serum8_mg 0.5010000 d8 d8 _ VAN+PFOS
## 34 R41 VAN+PFOS pfos_serum8_mg 0.4270000 d8 d8 _ VAN+PFOS
## 36 R42 VAN+PFOS pfos_serum8_mg 0.7600000 d8 d8 _ VAN+PFOS
## 38 R43 VAN+PFOS pfos_serum8_mg 0.5540000 d8 d8 _ VAN+PFOS
## 40 R44 VAN+PFOS pfos_serum8_mg 0.7520000 d8 d8 _ VAN+PFOS
## 42 R45 VAN+PFOS pfos_serum8_mg 0.4590000 d8 d8 _ VAN+PFOS
## 44 R46 VAN+PFOS pfos_serum8_mg 0.6440000 d8 d8 _ VAN+PFOS
## 46 R47 VAN+PFOS pfos_serum8_mg 1.0890000 d8 d8 _ VAN+PFOS
## 48 R48 VAN+PFOS pfos_serum8_mg 0.3980000 d8 d8 _ VAN+PFOS
# Set names of variables
PREDICTOR <- "ID"
OUTCOME <- "mg"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## ID variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 d4 _ PFOS mg 12 0.165 0.031
## 2 d4 _ VAN+PFOS mg 10 0.183 0.034
## 3 d8 _ PFOS mg 12 0.717 0.32
## 4 d8 _ VAN+PFOS mg 12 0.624 0.201
5.3.3.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = COL)
bxp5.3.3.3 Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## [1] ID rat_name treatment data_group mg day is.outlier
## [8] is.extreme
## <0 rækker> (eller 0-længde row.names)
Data contains two outliers: sample from rat_name R01 and R30.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.896 0.000640
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 42 9.67 0.0000564
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that PFOS concentrations from day 4 and 8 has two outliers, has unequal variance, and falls short on the Shapiro-Wilk test of normality (not normally distributed). Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
5.3.3.4 Kruskal-Wallis test
5.3.3.4.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 mg 46 34.1 3 0.000000187 Kruskal-Wallis
5.3.3.4.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 mg 46 0.741 eta2[H] large
5.3.3.4.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 mg d4 _ PFOS d4 _ V… 12 10 0.574 5.66e-1 6.79e-1 ns
## 2 mg d4 _ PFOS d8 _ P… 12 12 4.62 3.92e-6 2.35e-5 ****
## 3 mg d4 _ PFOS d8 _ V… 12 12 4.33 1.51e-5 4.54e-5 ****
## 4 mg d4 _ VAN+PFOS d8 _ P… 10 12 3.83 1.30e-4 2.60e-4 ***
## 5 mg d4 _ VAN+PFOS d8 _ V… 10 12 3.55 3.84e-4 5.75e-4 ***
## 6 mg d8 _ PFOS d8 _ V… 12 12 -0.289 7.73e-1 7.73e-1 ns
5.3.3.5 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = COL,labels = c("PFOS day 4","VAN+PFOS day 4","PFOS day 8","VAN+PFOS day 8")) +
scale_y_continuous(name = "mg",limits = c(0,1.75),breaks = seq(0,1.75,0.5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment", labels = c("PFOS\nday 4","VAN+PFOS\nday 4","PFOS\nday 8","VAN+PFOS\nday 8")) +
theme(axis.title.x = element_blank())
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = FALSE, y.position = c(1.635,1.75,1.4,1.52))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/pfos_day48_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.4 Liver day 8
This section will prepare to perform the data analysis for PFOS data from liver on day 8.
5.4.1 ug/g in liver tissue
5.4.1.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_liver_ugg))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_liver_ugg"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL pfos_liver_ugg 12 0.199 0.173
## 2 PFOS pfos_liver_ugg 12 176. 21.7
## 3 VAN pfos_liver_ugg 12 0.205 0.22
## 4 VAN+PFOS pfos_liver_ugg 12 196. 18.5
5.4.1.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R11 11 no no 271. 278. 287. 294. 293. 299
## 2 VAN R16 28 no yes 256. 260 268. 275. 273 279
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers: sample from rat_name R01 and R30.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.877 0.000119
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 12.2 0.00000631
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that body weight gain data has two outliers and has equal variance, however falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
5.4.1.3 Kruskal-Wallis test
5.4.1.3.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 pfos_liver_ugg 48 36.4 3 0.0000000608 Kruskal-Wallis
5.4.1.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 pfos_liver_ugg 48 0.760 eta2[H] large
5.4.1.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 pfos_liver_u… CTRL PFOS 12 12 3.62 2.98e-4 4.47e-4 ***
## 2 pfos_liver_u… CTRL VAN 12 12 -0.102 9.19e-1 9.19e-1 ns
## 3 pfos_liver_u… CTRL VAN+P… 12 12 4.68 2.85e-6 8.54e-6 ****
## 4 pfos_liver_u… PFOS VAN 12 12 -3.72 2.00e-4 4.00e-4 ***
## 5 pfos_liver_u… PFOS VAN+P… 12 12 1.06 2.87e-1 3.44e-1 ns
## 6 pfos_liver_u… VAN VAN+P… 12 12 4.78 1.72e-6 8.54e-6 ****
5.4.1.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "ug/g",limits = c(0,270),breaks = seq(0,270,0.5)) +
scale_y_break(breaks = c(1,140), scales = 3, ticklabels = c(150,200,250), space = 0.3) +
theme(axis.title.x = element_blank(),
axis.line.y.right = element_blank(),
axis.text.y.right = element_blank(),
axis.ticks.y.right = element_blank()) +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(235,265,250,235))
pggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 107, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 107, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot_legend.pdf"), p, device = "pdf", dpi = 300, units = "mm", width = 110, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage. ### mg in
liver tissue in all groups #### Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Remove rows with NA
dat.clean <- subset(dat, !is.na(pfos_liver_mg))
#dat.clean <- dat %>% select_if(~ !any(is.na(.)))
#dat.clean <- subset(dat, !dat$rat_name %in% c("R01","R30"))
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "pfos_liver_mg"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL pfos_liver_mg 12 0.002 0.002
## 2 PFOS pfos_liver_mg 12 2.11 0.305
## 3 VAN pfos_liver_mg 12 0.002 0.002
## 4 VAN+PFOS pfos_liver_mg 12 2.22 0.341
5.4.1.5 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 1 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN R16 28 no yes 256. 260 268. 275. 273 279
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains one not extreme outliers (R16).
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.861 0.0000433
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 16.5 0.000000257
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that mg PFOS in liver in all groups has one outlier, unequal variance, and falls short on the Shapiro-Wilk test of normality and is therefore not normally distributed. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
5.4.1.6 Kruskal-Wallis test
5.4.1.6.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 pfos_liver_mg 48 35.6 3 0.0000000927 Kruskal-Wallis
5.4.1.6.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 pfos_liver_mg 48 0.740 eta2[H] large
5.4.1.6.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 pfos_liver_mg CTRL PFOS 12 12 3.86 1.12e-4 1.67e-4 ***
## 2 pfos_liver_mg CTRL VAN 12 12 -0.146 8.84e-1 8.84e-1 ns
## 3 pfos_liver_mg CTRL VAN+P… 12 12 4.39 1.14e-5 3.42e-5 ****
## 4 pfos_liver_mg PFOS VAN 12 12 -4.01 6.08e-5 1.22e-4 ***
## 5 pfos_liver_mg PFOS VAN+P… 12 12 0.525 6.00e-1 7.20e-1 ns
## 6 pfos_liver_mg VAN VAN+P… 12 12 4.53 5.77e-6 3.42e-5 ****
5.4.1.7 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mg PFOS",limits = c(0,3.5),breaks = seq(0,3.5,0.01)) +
scale_y_break(breaks = c(0.01,1), scales = 3, ticklabels = c(1.0,2.0,3.0), space = 0.3) +
labs(fill = "Treatment") +
theme(axis.title.x = element_blank(),
axis.line.y.right = element_blank(),
axis.text.y.right = element_blank(),
axis.ticks.y.right = element_blank()) +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(3.2,3,3.4,3.2))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_all_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_all_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_all_plot_legend.pdf"), p, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.4.2 Total mg in liver
5.4.2.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_liver_mg"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_liver_mg))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.4.2.2 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
## [1] treatment rat_name ordering pfos
## [5] van bw_0 bw_1 bw_2
## [9] bw_3 bw_4 bw_5 bw_6
## [13] bw_7 bw_8 bw_gain cecum_wt
## [17] cecum_wt_bw cecum_norm liver_wt liver_wt_bw
## [21] liver_norm tot_pfos4 blood_vol4_mL pfos_serum4_ugml
## [25] pfos_serum4_ug pfos_serum4_mg pfos_serum4_pct tot_pfos8
## [29] blood_vol8_mL pfos_serum8_ugml pfos_serum8_ug pfos_serum8_mg
## [33] pfos_serum8_pct pfos_change48_pct pfos_liver_ugg pfos_liver_mg
## [37] pfos_liver_pct acetic formic propanoic
## [41] m2_propanoic butanoic m3_butanoic pentanoic
## [45] m4_pentanoic hexanoic heptanoic is.outlier
## [49] is.extreme
## <0 rækker> (eller 0-længde row.names)
Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_liver_mg 0.944 0.546
## 2 VAN+PFOS pfos_liver_mg 0.955 0.711
If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 22 0.142 0.710
No outliers were identified. Data is normally distributed and has equal variance. Hence we use t-test.
5.4.2.3 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -0.115 2.11 2.22 pfos_l… PFOS VAN+P… 12 12 -0.870 0.394
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
Effect size
The effect size is calculated as Cohen’s D
## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_liver_mg PFOS VAN+PFOS -0.355 12 12 small
5.4.2.4 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mg PFOS",limits = c(0,3),breaks = seq(0,3,0.5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(3))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.4.3 Pct.
5.4.3.1 Prepare data
This section sets the variables to be used and prepares the data if necessary.
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pfos_liver_pct"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- subset(dat, pfos == "yes")
# Remove rows with NA
dat.clean <- subset(dat.clean, !is.na(pfos_liver_pct))
# Will yoou run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}5.4.3.2 Assumptions and preliminary tests
The two-samples t-tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS R26 14 yes no 238. 246. 248 257 259. 265
## 2 PFOS R27 15 yes no 270. 280. 284. 291. 290. 296
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Any extreme outliers can be bad samples or errors in data entry. If outliers compare a test with and without the outlier to determine if it is important, or run a non-parametric Wilcoxon test.
Check normality by groups
The normality assumption can be checked by computing the Shapiro-Wilk
test for each group. If the data is normally distributed, the p-value
should be greater than 0.05. You can also create QQ plots for each
group. QQ plot draws the correlation between a given data and the normal
distribution.
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
Consequently, we should not rely on only one approach for assessing the normality. A better strategy is to combine visual inspection and statistical test.
## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pfos_liver_pct 0.985 0.997
## 2 VAN+PFOS pfos_liver_pct 0.937 0.456
If both Shapiro test has p > 0.05 and/ or the QQplot follows the reference line the data follows a normal distribution.
If the data does not follow the normal distribution run a Wilcoxon Rank-sum test
Check the equality of variances
This can be done using the Levene’s test. If the variances of groups are
equal, the p-value should be greater than 0.05.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 22 0.00119 0.973
If the p-value of the Levene’s test is significant, it suggests that
there is a significant difference between the variances of the two
groups. In such case we should use Welch t-test, which doesn’t assume
the equality of the two variances (var.equal=FALSE). If the
Levene’s test is non-significant we can perform a Student t-test
(var.equal=TRUE).
Two outliers were identified but analysis does not differ in result when excluded. Data is normally distributed and has equal variance. Hence we use t-test.
5.4.3.3 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -2.30 35.0 37.3 pfos_l… PFOS VAN+P… 12 12 -1.46 0.159
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
The output provides:
.y.: the y variable used in the test.group1,group2: the compared groups in the pairwise tests.statistic: Test statistic used to compute the p-value.df: degrees of freedom.p: p-value.p.adj: the adjusted p-value.method: the statistical test used to compare groups.p.signif, p.adj.signif: the significance level of p-values and adjusted p-values, respectively.estimate: estimate of the effect size. It corresponds to the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.estimate1, estimate2: show the mean values of the two groups, respectively, for independent samples t-tests.alternative: a character string describing the alternative hypothesis.conf.low,conf.high: Lower and upper bound on a confidence interval.
Effect size
The effect size is calculated as Cohen’s D
## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 pfos_liver_pct PFOS VAN+PFOS -0.595 12 12 moderate
5.4.3.4 Create figure
# Prepare stats
stat.test <- stat.test %>% add_xy_position(x = PREDICTOR)
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "% of total dosed PFOS", limits = c(25,45),breaks = seq(25,45,5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment")
p <- p + stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(45))
p2 <- p + labs(subtitle = get_test_label(stat.test, detailed = TRUE))
p2# Plot for saving without legend
p3 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_plot.pdf"), p3, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.5 Total PFOS detected on day 8
This section will prepare to perform the data analysis for total PFOS on day 8.
5.5.1 Analysis and Barplot
5.5.1.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Create new dataframe with pfos data
dat.sub <- subset(dat, pfos == "yes")
tmp <- dat.sub[ , c("rat_name","ordering","treatment","tot_pfos8","pfos_serum8_mg","pfos_liver_mg")]
# Calculate ratios
for (i in tmp$rat_name) {
tmp$total_measured <- tmp$pfos_liver_mg + tmp$pfos_serum8_mg
}
for (i in tmp$rat_name) {
tmp$leftover <- tmp$tot_pfos8 - tmp$total_measured
}
# Calculate percentage of detected PFOS in rats
for (i in tmp$rat_name) {
tmp$pct_det <- tmp$total_measured / tmp$tot_pfos8 * 100
tmp$pct_liver <- tmp$pfos_liver_mg / tmp$tot_pfos8 * 100
tmp$pct_serum <- tmp$pfos_serum8_mg / tmp$tot_pfos8 * 100
}
print("Group: PFOS")## [1] "Group: PFOS"
## pct_det
## Min. :37.73
## 1st Qu.:42.16
## Median :44.60
## Mean :46.83
## 3rd Qu.:51.56
## Max. :57.59
## pct_liver
## Min. :27.05
## 1st Qu.:33.19
## Median :34.93
## Mean :35.04
## 3rd Qu.:36.71
## Max. :42.56
## pct_serum
## Min. : 7.091
## 1st Qu.: 7.673
## Median : 9.778
## Mean :11.789
## 3rd Qu.:14.862
## Max. :22.244
## [1] "Group: VAN+PFOS"
## pct_det
## Min. :41.29
## 1st Qu.:45.41
## Median :46.69
## Mean :47.88
## 3rd Qu.:49.49
## Max. :60.94
## pct_liver
## Min. :32.68
## 1st Qu.:34.15
## Median :37.31
## Mean :37.34
## 3rd Qu.:39.29
## Max. :44.04
## pct_serum
## Min. : 7.138
## 1st Qu.: 8.186
## Median : 9.289
## Mean :10.540
## 3rd Qu.:11.851
## Max. :19.380
# Analysis of significance between treatment groups
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pct_det"
SUBJECT <- "rat_name"
# Subset to a specific varible
dat.clean <- tmp
# Will you run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}
# identify outliers
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 1 × 13
## treatment rat_name ordering tot_pfos8 pfos_serum8_mg pfos_liver_mg
## <chr> <chr> <int> <dbl> <dbl> <dbl>
## 1 VAN+PFOS R47 47 5.62 1.09 2.34
## # ℹ 7 more variables: total_measured <dbl>, leftover <dbl>, pct_det <dbl>,
## # pct_liver <dbl>, pct_serum <dbl>, is.outlier <lgl>, is.extreme <lgl>
## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS pct_det 0.945 0.570
## 2 VAN+PFOS pct_det 0.879 0.0853
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 22 0.946 0.341
Data contain one not extreme outlier, is normally distributed, and has equal variance. Therefore we perform unpaired two-tailed t-test.
5.5.1.2 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -1.05 46.8 47.9 pct_det PFOS VAN+P… 12 12 -0.447 0.659
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
Result of t-test show that there is no significant difference between the total percentage of detected PFOS (where 100% is the total dose) between the two groups.
5.5.1.3 Create barplots
# Prepare data columns for barplot
tmp2 <- rbind(data.frame("rat_name" = tmp$rat_name, "treatment" = tmp$treatment, "mg" = tmp$leftover, "type" = "Unaccounted"),
data.frame("rat_name" = tmp$rat_name, "treatment" = tmp$treatment, "mg" = tmp$pfos_serum8_mg, "type" = "PFOS serum"),
data.frame("rat_name" = tmp$rat_name, "treatment" = tmp$treatment, "mg" = tmp$pfos_liver_mg, "type" = "PFOS liver"))
# Create plot per rat
p <- ggplot(tmp2, aes(x = rat_name, y = mg, fill = fct_rev(type))) +
geom_bar(position = "fill", stat = "identity") +
theme_pubr(legend = "top") +
facet_grid(~ treatment, scales = "free_x") +
labs(fill = "Sample type", x = "Rats", y = "% of total dosed") +
scale_fill_manual(values = c("Unaccounted"= "#ffffff", "PFOS liver" = "#FECE00", "PFOS serum" = "#cf200D")) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
scale_y_continuous(labels = function(x) paste0(x*100, "%"))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/total_rat_barplot.png"), p, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/total_rat_barplot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 70, height = 100)
# Create plot for average per treatment group
p <- ggplot(tmp2, aes(x = treatment, y = mg, fill = fct_rev(type))) +
geom_bar(position = "fill", stat = "identity") +
theme_pubr(legend = "top") +
labs(fill = "Sample type", x = "Treatment",y = "% of total dosed") +
scale_fill_manual(values = c("Unaccounted"= "#ffffff", "PFOS liver" = "#FECE00", "PFOS serum" = "#cf200D")) +
theme(axis.ticks.x=element_blank()) +
scale_y_continuous(labels = function(x) paste0(x*100, "%"))
p# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/total_mean_barplot.png"), p, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/total_mean_barplot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 60, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.5.6 Liver-to-serum (µg/g / ug/g) ratio
5.6.1 Analysis and boxplot
5.6.1.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Create new dataframe with pfos data
dat.sub <- subset(dat, pfos == "yes")
tmp <- dat.sub[ , c("rat_name","ordering","treatment","pfos_serum8_ugml","pfos_liver_ugg")]
# Calculate ratios
for (i in tmp$rat_name) {
tmp$ls_ratio <- tmp$pfos_liver_ugg / tmp$pfos_serum8_ugml
}
print("Group: PFOS")## [1] "Group: PFOS"
## ls_ratio
## Min. :2.088
## 1st Qu.:4.267
## Median :5.539
## Mean :5.564
## 3rd Qu.:6.958
## Max. :8.401
## [1] "Group: VAN+PFOS"
## ls_ratio
## Min. : 3.427
## 1st Qu.: 5.279
## Median : 6.466
## Mean : 6.618
## 3rd Qu.: 7.949
## Max. :10.162
tmp %>% group_by(across(all_of("treatment"))) %>% get_summary_stats(!!sym("ls_ratio"), type = "mean_sd")## # A tibble: 2 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 PFOS ls_ratio 12 5.56 1.91
## 2 VAN+PFOS ls_ratio 12 6.62 1.95
# Analysis of significance between treatment groups
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "ls_ratio"
SUBJECT <- "rat_name"
# Subset to a specific variable
dat.clean <- tmp
# Will you run a paired test? (set variable to `TRUE` or `FALSE`)
PAIRED <- FALSE
# Create formula
FORMULA <- as.formula(paste(OUTCOME, PREDICTOR, sep = "~"))
# Sort data for paired test
if (PAIRED) {
# Order data
dat.clean <- arrange(dat.clean, !!sym(SUBJECT))
# Remove unpaired samples
dat.clean <- dat.clean %>%
group_by(!!sym(SUBJECT)) %>%
filter(n() != 1) %>%
arrange(!!sym(PREDICTOR), !!sym(SUBJECT)) %>%
droplevels() %>%
ungroup()
}
# identify outliers
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
identify_outliers(!!sym(OUTCOME))## [1] treatment rat_name ordering pfos_serum8_ugml
## [5] pfos_liver_ugg ls_ratio is.outlier is.extreme
## <0 rækker> (eller 0-længde row.names)
## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS ls_ratio 0.972 0.932
## 2 VAN+PFOS ls_ratio 0.973 0.940
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 22 0.0217 0.884
Data contains no outliers, is normally distributed, and has equal variance. Therefore we perform unpaired two-tailed t-test.
5.6.1.2 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -1.05 5.56 6.62 ls_rat… PFOS VAN+P… 12 12 -1.34 0.195
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
Result of t-test show that there is no significant difference between the total percentage of detected PFOS (where 100% is the total dose) between the two groups.
5.6.1.3 Create boxplot
p <- ggboxplot(dat.clean, x = "treatment", y = "ls_ratio",
fill = "treatment",
add = "jitter",
add.params = list(size = 1)) +
theme_pubr(legend = "top") +
scale_fill_manual(values = params$COL) +
theme(axis.title.x = element_blank()) +
labs(fill = "Treatment", y = "Liver-to-serum PFOS ratio") +
stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(11))
pp2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_boxplot.png"), p, device = "png", dpi = 300, units = "mm", width = 90, height = 100)
ggsave(filename = paste0("plots/animal_data/pfos/",OUTCOME,"_boxplot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 64, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.6 PFOS ISOMER ANALYSIS
In this section we investigate linear (l-PFOS) and branched PFOS (br-PFOS) by a ratio of the two based on calculated peak area expressed as “bl-ratio”. These data are obtained from quantitative mass spectrometry analysis on retention-time peaks for br-PFOS and l-PFOS, respectively.
Factors that will be investigated are each sample type (material: “Serum”, “Liver”), day of measurement (only applies to serum; day: “d4”, “d8”), treatment groups and level of bl-ratio between samples and solvent controls spiked with same batch of PFOS used for oral gavage (treatment: “PFOS”, “VAN+PFOS”, “Control” = where control are spiked controls; type: “Sample”,“Control”).
6.1 Import data
Data is imported from CSV format and bl-ratios are calculated by br-PFOS / l-PFOS.
# Load analysis data
dat <- read.csv("input/pfos_isomer_data.csv", header = TRUE, sep = ";", dec = ",")
# Calculate branched-linear PFOS ratio based off AMT
for (i in dat$id) {
dat$bl_ratio <- dat$area_branch / dat$area_linear
}
# Create common predictor for later analysis
for (i in dat$id) {
dat$mat_treat <- paste0(dat$material,"_",dat$treatment)
}
save(dat, file = "R_objects/pfos_isomer_data.Rdata")6.2 Prepare Serum data
Investigation of bl-ratio in treatment groups in serum samples on Day 4 and Day 8.
# Load data
load("R_objects/pfos_isomer_data.Rdata")
# Subset
dat.clean <- subset(dat, material == "Serum" & !treatment == "Control")
# Set names of variables
PREDICTOR <- c("day","treatment")#c("treatment","pfos","van")
OUTCOME <- "bl_ratio"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 6
## day treatment variable n mean sd
## <chr> <chr> <fct> <dbl> <dbl> <dbl>
## 1 d4 PFOS bl_ratio 12 0.22 0.018
## 2 d4 VAN+PFOS bl_ratio 10 0.241 0.021
## 3 d8 PFOS bl_ratio 12 0.222 0.015
## 4 d8 VAN+PFOS bl_ratio 12 0.211 0.012
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 1 × 22
## day treatment id material order type rat_name is_area rt_branch
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <dbl>
## 1 d8 PFOS Serum_R31_d8 Serum A Sample R31 25554 10.3
## # ℹ 13 more variables: area_branch <int>, amt_branch <dbl>, art_branch <dbl>,
## # rt_total <dbl>, area_total <int>, amt_total <dbl>, art_total <dbl>,
## # area_linear <int>, amt_linear <dbl>, bl_ratio <dbl>, mat_treat <chr>,
## # is.outlier <lgl>, is.extreme <lgl>
# Check normality
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.975 0.435
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 42 1.06 0.376
This shows that data has one outlier, is normally distribution and has equal variance. Therefore we can test the data with a one-way ANOVA test with Tukey’s honest significance test.
6.2.1 ANOVA One-Way test
6.2.1.1 Perform test
If we had equality of variance we can now run a one-way ANOVA tests
anova_test() (if we have equal variance) or a
welch_anova_test() (if variance vary).
if(EQUAL.VAR) {
res.aov <- dat.clean %>% anova_test(FORMULA)
res.aov
} else {
res.aov <- dat.clean %>% welch_anova_test(FORMULA)
res.aov
}## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 day 1 42 6.954 0.012 * 0.142
## 2 treatment 1 42 0.792 0.379 0.019
## 3 day:treatment 1 42 10.562 0.002 * 0.201
6.2.1.2 Perform posthoc test
if(EQUAL.VAR) {
pwc <- dat.clean %>% tukey_hsd(FORMULA)
pwc
} else {
pwc <- dat.clean %>% games_howell_test(FORMULA)
pwc
}## # A tibble: 8 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 day d4 d8 0 -0.0128 -2.27e-2 -0.00286 1.28e-2
## 2 treatment PFOS VAN+P… 0 0.00437 -5.55e-3 0.0143 3.79e-1
## 3 day:treatment d4:PFOS d8:PF… 0 0.00227 -1.59e-2 0.0204 9.87e-1
## 4 day:treatment d4:PFOS d4:VA… 0 0.0211 2.08e-3 0.0402 2.46e-2
## 5 day:treatment d4:PFOS d8:VA… 0 -0.00859 -2.68e-2 0.00958 5.9 e-1
## 6 day:treatment d8:PFOS d4:VA… 0 0.0189 -1.95e-4 0.0379 5.33e-2
## 7 day:treatment d8:PFOS d8:VA… 0 -0.0109 -2.90e-2 0.00731 3.9 e-1
## 8 day:treatment d4:VAN+PF… d8:VA… 0 -0.0297 -4.88e-2 -0.0107 8.26e-4
## # ℹ 1 more variable: p.adj.signif <chr>
Significant impact is observed between days in the VAN+PFOS groups and between treatment groups on Day 4. We will plot this as a nested analysis with pairwise t-tests on the inner variable and the outer variable.
6.2.1.3 Create figure
## Pairwise comparison for inner variable: day
stat.in <- dat.clean %>%
group_by(treatment) %>%
t_test(bl_ratio ~ day,
paired = FALSE, var.equal = EQUAL.VAR,
detailed = TRUE, alternative = "two.sided") %>%
add_significance() %>%
p_format("p", accuracy = 0.001, trailing.zero = TRUE, new.col = TRUE)
stat.in## # A tibble: 2 × 18
## treatment estimate estimate1 estimate2 .y. group1 group2 n1 n2
## * <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int>
## 1 PFOS -0.00227 0.220 0.222 bl_ratio d4 d8 12 12
## 2 VAN+PFOS 0.0297 0.241 0.211 bl_ratio d4 d8 10 12
## # ℹ 9 more variables: statistic <dbl>, p <dbl>, df <dbl>, conf.low <dbl>,
## # conf.high <dbl>, method <chr>, alternative <chr>, p.signif <chr>,
## # p.format <chr>
## Pairwise comparison for outer variable: treatment
stat.out <- dat.clean %>%
t_test(bl_ratio ~ treatment,
paired = FALSE, var.equal = EQUAL.VAR,
detailed = TRUE, alternative = "two.sided") %>%
add_significance() %>%
p_format("p", accuracy = 0.001, trailing.zero = TRUE, new.col = TRUE)
stat.out## # A tibble: 1 × 17
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## * <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -0.00379 0.221 0.225 bl_rat… PFOS VAN+P… 24 22 -0.663 0.511
## # ℹ 7 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>, p.format <chr>
## Calculate positions statistics on plot
stat.in <- stat.in %>% add_xy_position(x = "treatment", dodge = 0.8)
stat.out <- stat.out %>% add_xy_position(x = "treatment")
stat.out$y.position <- max(stat.in$y.position)*1.03
# Create plot
p <- ggboxplot(dat.clean, x = "treatment", y = "bl_ratio",
fill = "day",
color = "day",
add = "jitter",
add.params = list(size = 1)) +
theme_pubr(legend = "top") +
scale_color_manual(values = c("d4" = "black","d8" = "black")) +
scale_fill_manual(values = c("#ffffff","#aaaaaa"), name = "Day", labels = c("4","8")) +
scale_y_continuous(name = "Serum B/L ratio", limits = c(0.15,0.3), breaks = seq(0.15,0.3,0.05)) +
theme(axis.title.x = element_blank()) +
guides(color = "none")
p.stat <- p + stat_pvalue_manual(stat.in, tip.length = 0, hide.ns = FALSE) +
stat_pvalue_manual(stat.out, tip.length = 0, hide.ns = FALSE)
p.statsuppressMessages(ggsave(filename = "plots/animal_data/pfos/isomer_branched-linear_serum.pdf", plot = p.stat, device = "pdf", dpi = 300, units = "mm", height = 100, width = 100))
suppressMessages(ggsave(filename = "plots/animal_data/pfos/isomer_branched-linear_serum.png", plot = p.stat, device = "png", dpi = 300, units = "mm", height = 100, width = 100))6.3 Prepare Liver data
Investigation of bl-ratio in treatment groups in liver samples. These are only tested for Day 8, as being the only sampling day for liver.
# Load data
load("R_objects/pfos_isomer_data.Rdata")
# Subset
dat.clean <- subset(dat, material == "Liver" & !treatment == "Control")
# Set names of variables
PREDICTOR <- "treatment"#c("treatment","pfos","van")
OUTCOME <- "bl_ratio"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 2 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 PFOS bl_ratio 12 0.172 0.013
## 2 VAN+PFOS bl_ratio 12 0.18 0.009
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 3 × 22
## treatment id material order type day rat_name is_area rt_branch
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <dbl>
## 1 PFOS Liver_R29_d8 Liver B Sample d8 R29 19478 10.3
## 2 PFOS Liver_R36_d8 Liver B Sample d8 R36 18635 10.3
## 3 VAN+PFOS Liver_R41_d8 Liver B Sample d8 R41 19421 10.3
## # ℹ 13 more variables: area_branch <int>, amt_branch <dbl>, art_branch <dbl>,
## # rt_total <dbl>, area_total <int>, amt_total <dbl>, art_total <dbl>,
## # area_linear <int>, amt_linear <dbl>, bl_ratio <dbl>, mat_treat <chr>,
## # is.outlier <lgl>, is.extreme <lgl>
# Check normality
# Run Shapiro test
dat.clean %>%
group_by(!!sym(PREDICTOR)) %>%
shapiro_test(!!sym(OUTCOME))## # A tibble: 2 × 4
## treatment variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 PFOS bl_ratio 0.923 0.312
## 2 VAN+PFOS bl_ratio 0.901 0.165
# Check the homogeneity of variances with Levene's test
# Run test
dat.clean %>% levene_test(FORMULA)## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 22 0.675 0.420
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
Two outliers were identified. Data is normally distributed and has equal variance. Hence we use two-tailed t-test.
6.3.1 PERFORM TEST
T-test
We are now ready to perform the test
stat.test <- dat.clean %>%
t_test(FORMULA,
var.equal = EQUAL.VAR,
detailed = TRUE,
paired = FALSE,
alternative = "two.sided") %>%
add_significance()
stat.test## # A tibble: 1 × 16
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -0.00847 0.172 0.180 bl_ra… PFOS VAN+P… 12 12 -1.82 0.0817
## # ℹ 6 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>, p.signif <chr>
Effect size
The effect size is calculated as Cohen’s D
## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 bl_ratio PFOS VAN+PFOS -0.745 12 12 moderate
No significance is observed between treatment groups in the liver samples. We present this in a plot using the above statistics.
6.3.2 Create figure
# Create plot
p <- ggboxplot(dat.clean, x = "treatment", y = "bl_ratio",
fill = "treatment",
add = "jitter",
add.params = list(size = 1)) +
theme_pubr(legend = "top") +
scale_fill_manual(values = params$COL, name = "Treatment") +
scale_y_continuous(name = "Liver B/L ratio", limits = c(0.15,0.3), breaks = seq(0.15,0.3,0.05)) +
theme(axis.title.x = element_blank()) +
stat_pvalue_manual(stat.test, tip.length = 0, hide.ns = FALSE, y.position = c(0.22))## Error in is_missing(values): objekt 'params' blev ikke fundet
suppressMessages(ggsave(filename = "plots/animal_data/pfos/isomer_branched-linear_liver.pdf", plot = p, device = "pdf", dpi = 300, units = "mm", height = 100, width = 60))
suppressMessages(ggsave(filename = "plots/animal_data/pfos/isomer_branched-linear_liver.png", plot = p, device = "png", dpi = 300, units = "mm", height = 100, width = 60))6.4 Prepare test of “material” day 8
Here we aim to test differences in bl-ratio between liver and serum on Day 8. We exclude Day 4 as there are no equivalent liver data to compare to and we know from previous that Day 4 serum has slightly higher bl-ratio than Day 8 and both being higher than Liver, making any statistical significant difference between materials apply to Day 4 Serum samples as well. Included in the samples presented here are spiked negative controls and solvent controls which all have had the same batch of PFOS added directly to the same before analysis. These controls reflect the batch proportion of l-PFOS to br-PFOS.
load("R_objects/pfos_isomer_data.Rdata")
# Subset
dat.clean <- subset(dat, day == "d8") #!treatment == "Control" &
# Set names of variables
PREDICTOR <- c("material","treatment")
OUTCOME <- "bl_ratio"
SUBJECT <- "rat_name"
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 6 × 6
## material treatment variable n mean sd
## <chr> <chr> <fct> <dbl> <dbl> <dbl>
## 1 Liver Control bl_ratio 12 0.075 0.009
## 2 Liver PFOS bl_ratio 12 0.172 0.013
## 3 Liver VAN+PFOS bl_ratio 12 0.18 0.009
## 4 Serum Control bl_ratio 4 0.067 0.014
## 5 Serum PFOS bl_ratio 12 0.222 0.015
## 6 Serum VAN+PFOS bl_ratio 12 0.211 0.012
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 4 × 22
## material treatment id order type day rat_name is_area rt_branch
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <dbl>
## 1 Liver PFOS Liver_R29_d8 B Sample d8 R29 19478 10.3
## 2 Liver PFOS Liver_R36_d8 B Sample d8 R36 18635 10.3
## 3 Liver VAN+PFOS Liver_R41_d8 B Sample d8 R41 19421 10.3
## 4 Serum PFOS Serum_R31_d8 A Sample d8 R31 25554 10.3
## # ℹ 13 more variables: area_branch <int>, amt_branch <dbl>, art_branch <dbl>,
## # rt_total <dbl>, area_total <int>, amt_total <dbl>, art_total <dbl>,
## # area_linear <int>, amt_linear <dbl>, bl_ratio <dbl>, mat_treat <chr>,
## # is.outlier <lgl>, is.extreme <lgl>
# Check normality
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.970 0.117
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 5 58 0.433 0.824
This shows that data has four none-critical outliers, is normally distribution and has equal variance. Therefore we can test the data with a one-way ANOVA test with Tukey’s honest significance test.
6.4.1 ANOVA One-Way test
6.4.1.1 Perform test
If we had equality of variance we can now run a one-way ANOVA tests
anova_test() (if we have equal variance) or a
welch_anova_test() (if variance vary).
if(EQUAL.VAR) {
res.aov <- dat.clean %>% anova_test(FORMULA)
res.aov
} else {
res.aov <- dat.clean %>% welch_anova_test(FORMULA)
res.aov
}## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 material 1 58 97.473 4.98e-14 * 0.627
## 2 treatment 2 58 520.433 8.95e-38 * 0.947
## 3 material:treatment 2 58 23.081 4.22e-08 * 0.443
6.4.1.2 Perform posthoc test
if(EQUAL.VAR) {
pwc <- dat.clean %>% tukey_hsd(FORMULA)
pwc
} else {
pwc <- dat.clean %>% games_howell_test(FORMULA)
pwc
}## # A tibble: 19 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 material Liver Serum 0 0.0530 0.0469 0.0591 1.49e-11
## 2 treatment Contr… PFOS 0 0.111 0.101 0.120 1.49e-11
## 3 treatment Contr… VAN+P… 0 0.110 0.100 0.119 1.49e-11
## 4 treatment PFOS VAN+P… 0 -0.00120 -0.00962 0.00723 9.38e- 1
## 5 material:treat… Liver… Serum… 0 -0.00797 -0.0286 0.0127 8.63e- 1
## 6 material:treat… Liver… Liver… 0 0.0970 0.0824 0.112 1.49e-11
## 7 material:treat… Liver… Serum… 0 0.147 0.133 0.162 1.49e-11
## 8 material:treat… Liver… Liver… 0 0.105 0.0909 0.120 1.49e-11
## 9 material:treat… Liver… Serum… 0 0.136 0.122 0.151 1.49e-11
## 10 material:treat… Serum… Liver… 0 0.105 0.0843 0.126 1.49e-11
## 11 material:treat… Serum… Serum… 0 0.155 0.135 0.176 1.49e-11
## 12 material:treat… Serum… Liver… 0 0.113 0.0928 0.134 1.49e-11
## 13 material:treat… Serum… Serum… 0 0.144 0.124 0.165 1.49e-11
## 14 material:treat… Liver… Serum… 0 0.0503 0.0357 0.0649 1.52e-11
## 15 material:treat… Liver… Liver… 0 0.00847 -0.00613 0.0231 5.31e- 1
## 16 material:treat… Liver… Serum… 0 0.0395 0.0249 0.0541 1.05e- 9
## 17 material:treat… Serum… Liver… 0 -0.0419 -0.0565 -0.0273 1.77e-10
## 18 material:treat… Serum… Serum… 0 -0.0109 -0.0255 0.00374 2.57e- 1
## 19 material:treat… Liver… Serum… 0 0.0310 0.0164 0.0456 7.53e- 7
## # ℹ 1 more variable: p.adj.signif <chr>
Significant impact is observed overall between liver and serum, as well as between spiked controls and both treatment groups for each material. No significance is observed only between controls run with each material group and overall between treatment groups within serum and liver, respectively. We present this data as a nested plot with serum and liver grouping as inner and PFOS, VAN+PFOS and Control (treatment variable) as outer with accompanying t-test and anova with tukey’s.
6.4.2 Create figure
## Pairwise comparison for inner variable
stat.in <- dat.clean %>%
group_by(treatment) %>%
t_test(bl_ratio ~ order,
paired = FALSE, var.equal = EQUAL.VAR,
detailed = TRUE, alternative = "two.sided") %>%
add_significance() %>%
p_format("p", accuracy = 0.001, trailing.zero = TRUE, new.col = TRUE)
stat.in## # A tibble: 3 × 18
## treatment estimate estimate1 estimate2 .y. group1 group2 n1 n2
## * <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int>
## 1 Control -0.00797 0.0668 0.0747 bl_ratio A B 4 12
## 2 PFOS 0.0503 0.222 0.172 bl_ratio A B 12 12
## 3 VAN+PFOS 0.0310 0.211 0.180 bl_ratio A B 12 12
## # ℹ 9 more variables: statistic <dbl>, p <dbl>, df <dbl>, conf.low <dbl>,
## # conf.high <dbl>, method <chr>, alternative <chr>, p.signif <chr>,
## # p.format <chr>
## Pairwise comparison for outer variable
stat.out <- dat.clean %>%
anova_test(bl_ratio ~ treatment) %>%
add_significance() %>%
p_format("p", accuracy = 0.001, trailing.zero = TRUE, new.col = TRUE)
stat.out## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges p.signif p.format
## 1 treatment 2 61 188.131 8.13e-27 * 0.86 **** <0.001
pwc2 <- dat.clean %>%
tukey_hsd(bl_ratio ~ treatment) %>%
add_significance() %>%
p_format("p.adj", accuracy = 0.001, trailing.zero = TRUE, new.col = TRUE)
pwc2## # A tibble: 3 × 10
## term group1 group2 null.value estimate conf.low conf.high p.adj
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 treatment Control PFOS 0 0.124 0.107 0.141 2.03e-11
## 2 treatment Control VAN+PFOS 0 0.123 0.106 0.140 2.03e-11
## 3 treatment PFOS VAN+PFOS 0 -0.00120 -0.0165 0.0141 9.81e- 1
## # ℹ 2 more variables: p.adj.signif <chr>, p.adj.format <chr>
## Calculate positions statistics on plot
stat.in <- stat.in %>% add_xy_position(x = "treatment", dodge = 0.8)
pwc2 <- pwc2 %>% add_xy_position(x = "treatment")
pwc2$y.position <- max(stat.in$y.position)*1.1
# Create plot
p <- ggboxplot(dat.clean, x = "treatment", y = "bl_ratio",
fill = "order",
color = "order",
add = "jitter",
add.params = list(size = 1)) +
theme_pubr(legend = "top") +
scale_color_manual(values = c("A" = "black","B" = "black")) +
scale_fill_manual(values = c("B" = "#FECE00", "A" = "#cf200D"), name = "Sample type", labels = c("A" = "Serum", "B" = "Liver")) +
scale_y_continuous(name = "B/L ratio", limits = c(0.05,0.3), breaks = seq(0.05,0.3,0.05)) +
theme(axis.title.x = element_blank()) +
guides(color = "none")
p.stat <- p + stat_pvalue_manual(stat.in, label = "p.signif", tip.length = 0, hide.ns = FALSE, y.position = c(0.11,0.25,0.25)) +
stat_pvalue_manual(pwc2, label = "p.adj.signif", tip.length = 0, hide.ns = FALSE, y.position = c(0.27,0.30,0.285))
p.statsuppressMessages(ggsave(filename = "plots/animal_data/pfos/isomer_branched-linear_material.pdf", plot = p.stat, device = "pdf", dpi = 300, units = "mm", height = 100, width = 136))
suppressMessages(ggsave(filename = "plots/animal_data/pfos/isomer_branched-linear_material.png", plot = p.stat, device = "png", dpi = 300, units = "mm", height = 100, width = 136))7 SHORT CHAIN FATTY ACID DATA
Following section is handling data analysis of short chain fatty acids from colonic samples collected at day 8. Ten SCFAs are analysed by MS Omics A/S (Denmark), delivered as concentrations in millimolar (mM), and tested accordingly here.
Concentrations in mM were recorded from proximal colonic samples collected from all animals at dissection.
Following analyses conclude overall Principal Coordinate analysis with PERMANOVA and individual boxplots comparing compound concentrations between treatment groups.
7.1 PCOA AND PERMANOVA ANALYSIS OF SCFA
# Load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
dat.clean <- dat %>% select(rat_name, treatment, pfos, van,
acetic, propanoic, m2_propanoic, butanoic, m3_butanoic, pentanoic, hexanoic,formic, m4_pentanoic, heptanoic) # formic, m4_pentanoic, heptanoic excluded due to low sample count
# Subset data with NA
dat.clean <- subset(dat.clean, !dat.clean$rat_name == "R08") # Mostly <LOD - suspects error in sample handling
# Create SCFA table
row.names(dat.clean) <- dat.clean$rat_name
dat.SCFA <- dat.clean %>% select(acetic, propanoic, m2_propanoic, butanoic, m3_butanoic, pentanoic, hexanoic)
dat.SCFA## acetic propanoic m2_propanoic butanoic m3_butanoic pentanoic
## R01 7.868665 0.3921931 0.10713169 1.71092008 0.06609145 0.05975663
## R02 7.529961 0.2648977 0.03172203 0.83718815 0.01943444 0.03931031
## R03 7.334644 0.3076036 0.04957959 1.55126305 0.02795564 0.07331600
## R04 7.211624 0.4002376 0.05054027 1.40421430 0.03441629 0.03030373
## R05 9.522715 0.2276087 0.04910177 2.42728491 0.02626586 0.04521286
## R06 6.277653 0.2297736 0.03902360 1.21216936 0.02140372 0.05487809
## R07 7.695140 0.2805200 0.07144139 1.92250223 0.04966964 0.06921387
## R09 7.035536 0.1942356 0.03438163 0.70002530 0.01989101 0.02446537
## R10 10.465522 0.3485516 0.06152928 1.33044438 0.04261478 0.06032153
## R11 8.556428 0.3814438 0.04937821 1.65193422 0.02931262 0.07490580
## R12 6.799043 0.1931678 0.04667801 0.64906530 0.02665150 0.02992233
## R25 9.315862 0.3044661 0.04532504 1.97734359 0.02870620 0.03622949
## R26 11.183570 0.3442234 0.08032019 2.17634348 0.05628301 0.10358275
## R27 3.818461 0.1384521 0.02690980 0.34833126 0.01873754 0.01690413
## R28 8.430942 0.4858121 0.07477591 1.27083868 0.05009847 0.06523660
## R29 4.116468 0.1978577 0.04098487 0.46145414 0.03295323 0.02816228
## R30 3.503725 0.1516586 0.03871071 0.25109881 0.02896771 0.03837706
## R31 8.719509 0.2917921 0.04451333 1.52335000 0.03458602 0.05070083
## R32 6.642488 0.2224302 0.03314928 0.75935657 0.01985540 0.02406519
## R33 9.218475 0.3206063 0.04025070 1.07517412 0.02147511 0.04536975
## R34 6.435524 0.2135714 0.04781338 1.00828074 0.02697322 0.04630859
## R35 8.880293 0.2698323 0.06744288 1.69473785 0.04507291 0.08184510
## R36 5.979113 0.2441023 0.06011517 1.17138908 0.03705746 0.04471346
## R13 3.051265 0.1587890 0.00000000 0.05988682 0.00000000 0.00000000
## R14 4.022243 0.2649193 0.00000000 0.16343697 0.00000000 0.00000000
## R15 2.060242 0.4008345 0.00000000 0.32230558 0.00000000 0.00000000
## R16 2.065274 0.3575574 0.00000000 0.05856690 0.00000000 0.00000000
## R17 3.685242 0.2450650 0.00000000 0.26577665 0.00000000 0.00000000
## R18 2.276606 0.4004846 0.00000000 0.15572863 0.00000000 0.00000000
## R19 2.775194 0.2418976 0.06122209 0.16288653 0.06118418 0.05317155
## R20 3.659261 0.2127580 0.00000000 0.74450130 0.00000000 0.00000000
## R21 3.008143 0.2394504 0.00000000 0.09711944 0.00000000 0.00000000
## R22 2.797761 0.1244424 0.00000000 0.07069875 0.00000000 0.00000000
## R23 3.818070 0.2752683 0.00000000 0.08078066 0.00000000 0.00000000
## R24 3.078666 0.2428717 0.00000000 0.06046572 0.00000000 0.00000000
## R37 2.540922 0.3583397 0.00000000 0.33848057 0.00000000 0.00000000
## R38 2.216871 0.3811121 0.00000000 0.17483102 0.00000000 0.00000000
## R39 1.937005 0.2666815 0.00000000 0.14572444 0.00000000 0.00000000
## R40 4.048858 0.1958996 0.00000000 0.07459486 0.00000000 0.00000000
## R41 3.343698 0.2100220 0.00000000 0.15205560 0.00000000 0.00000000
## R42 2.766247 0.2873487 0.09295739 0.26150544 0.07561294 0.08973810
## R43 3.201230 0.5728650 0.00000000 0.15850144 0.00000000 0.00000000
## R44 5.601203 0.3565805 0.00000000 0.09144850 0.00000000 0.00000000
## R45 3.817896 0.2534805 0.00000000 0.23164761 0.00000000 0.00000000
## R46 4.270382 0.3910062 0.00000000 0.19248275 0.00000000 0.00000000
## R47 3.335548 0.3513265 0.02224919 0.06549052 0.00000000 0.00000000
## R48 2.386966 0.2748988 0.00000000 0.43317059 0.00000000 0.00000000
## hexanoic
## R01 0.08802237
## R02 0.01549625
## R03 0.10609609
## R04 0.00000000
## R05 0.06308093
## R06 0.04631728
## R07 0.06971790
## R09 0.04058588
## R10 0.06044937
## R11 0.12464955
## R12 0.01712496
## R25 0.08287127
## R26 0.18375741
## R27 0.01467142
## R28 0.04373499
## R29 0.02202622
## R30 0.03269296
## R31 0.02251866
## R32 0.01674318
## R33 0.14119839
## R34 0.05045684
## R35 0.14247081
## R36 0.08931266
## R13 0.00000000
## R14 0.00000000
## R15 0.00000000
## R16 0.00000000
## R17 0.00000000
## R18 0.00000000
## R19 0.04677862
## R20 0.00000000
## R21 0.00000000
## R22 0.00000000
## R23 0.00000000
## R24 0.00000000
## R37 0.00000000
## R38 0.00000000
## R39 0.00000000
## R40 0.00000000
## R41 0.00000000
## R42 0.07295131
## R43 0.00000000
## R44 0.00000000
## R45 0.00000000
## R46 0.00000000
## R47 0.00000000
## R48 0.00000000
# Change all zeros to NA
dat.clean[dat.clean == 0] <- NA
# Summary samples in groups
tb <- dat.clean %>% group_by(across(all_of("treatment"))) %>% get_summary_stats(type = "mean_sd")7.1.1 DAtest (treatment)
# Test best method
filt.test <- testDA(t(dat.SCFA), predictor = dat.clean$treatment, effectSize = 10, relative = FALSE, k = c(1,1,2))## Warning in testDA(t(dat.SCFA), predictor = dat.clean$treatment, effectSize =
## 10, : Dataset contains very few features
## Running on 7 cores
## Warning in testDA(t(dat.SCFA), predictor = dat.clean$treatment, effectSize =
## 10, : Very few features spiked. Increase 'k' or set 'R' to more than 50 to
## ensure proper estimation of AUC and FPR
## Warning in testDA(t(dat.SCFA), predictor = dat.clean$treatment, effectSize =
## 10, : Set to spike more than half of the dataset, which might give unreliable
## estimates, Change k argument
## predictor is assumed to be a categorical variable with 4 levels: CTRL, PFOS, VAN, VAN+PFOS
## Spikeing...
## Testing 7 methods 20 times each...
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 16%
|
|============ | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 19%
|
|============== | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 21%
|
|================ | 22%
|
|================ | 23%
|
|================ | 24%
|
|================= | 24%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 30%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 36%
|
|========================== | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 39%
|
|============================ | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 41%
|
|============================== | 42%
|
|============================== | 43%
|
|============================== | 44%
|
|=============================== | 44%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 53%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 56%
|
|======================================== | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 59%
|
|========================================== | 59%
|
|========================================== | 60%
|
|========================================== | 61%
|
|=========================================== | 61%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================ | 64%
|
|============================================= | 64%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|======================================================== | 81%
|
|========================================================= | 81%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|========================================================== | 84%
|
|=========================================================== | 84%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 90%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 99%
|
|======================================================================| 100%
## Method AUC FPR FDR Power Score Score.5% Score.95%
## ANOVA (aov) 1 0 0 1.0 0.50 0.04 0.5 *
## Log ANOVA (lao) 1 0 0 1.0 0.50 0.04 0.5 *
## LIMMA (lim) 1 0 0 1.0 0.50 -0.38 0.5 *
## Linear regression (lrm) 1 0 0 1.0 0.50 -0.38 0.5 *
## Log LIMMA (lli) 1 0 0 1.0 0.50 -0.38 0.5 *
## Log Linear reg. (llm) 1 0 0 1.0 0.50 -0.38 0.5 *
## Kruskal-Wallis (kru) 1 0 0 0.5 0.25 -0.26 0.5 *
## Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun` argument instead.
## ℹ The deprecated feature was likely used in the DAtest package.
## Please report the issue at <]8;;https://github.com/Russel88/DAtest/issueshttps://github.com/Russel88/DAtest/issues]8;;>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `fun.ymin` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun.min` argument instead.
## ℹ The deprecated feature was likely used in the DAtest package.
## Please report the issue at <]8;;https://github.com/Russel88/DAtest/issueshttps://github.com/Russel88/DAtest/issues]8;;>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `fun.ymax` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun.max` argument instead.
## ℹ The deprecated feature was likely used in the DAtest package.
## Please report the issue at <]8;;https://github.com/Russel88/DAtest/issueshttps://github.com/Russel88/DAtest/issues]8;;>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## pval pval.adj log2FC ordering Feature
## acetic 1.556877e-02 1.816357e-02 0.06584986 yes>no acetic
## propanoic 1.234740e-07 4.321591e-07 1.43164499 yes>no propanoic
## m2_propanoic 9.330018e-03 1.306202e-02 -1.15449215 no>yes m2_propanoic
## butanoic 2.009708e-08 1.406795e-07 -1.27840244 no>yes butanoic
## m3_butanoic 7.450974e-02 7.450974e-02 -0.86743747 no>yes m3_butanoic
## pentanoic 5.798769e-03 1.014785e-02 -1.25935721 no>yes pentanoic
## hexanoic 2.431332e-04 5.673108e-04 -1.65518822 no>yes hexanoic
## Method
## acetic t-test (ttt)
## propanoic t-test (ttt)
## m2_propanoic t-test (ttt)
## butanoic t-test (ttt)
## m3_butanoic t-test (ttt)
## pentanoic t-test (ttt)
## hexanoic t-test (ttt)
## pval pval.adj Feature Method
## acetic 8.187692e-02 9.552307e-02 acetic ANOVA (aov)
## propanoic 2.239608e-07 1.567725e-06 propanoic ANOVA (aov)
## m2_propanoic 6.695521e-02 9.373730e-02 m2_propanoic ANOVA (aov)
## butanoic 4.750816e-07 1.662786e-06 butanoic ANOVA (aov)
## m3_butanoic 3.469938e-01 3.469938e-01 m3_butanoic ANOVA (aov)
## pentanoic 4.934198e-02 8.634847e-02 pentanoic ANOVA (aov)
## hexanoic 2.998526e-03 6.996560e-03 hexanoic ANOVA (aov)
## Feature pval log2FC coverage ordering pval.adj
## 1 acetic 0.09269073 -0.0010949396 1000 no>yes 0.1081392
## 2 propanoic 0.07239276 0.0051716716 1000 yes>no 0.1013499
## 3 m2_propanoic 0.05649435 0.0013391493 1000 yes>no 0.1013499
## 4 butanoic 0.07079292 -0.0075001412 1000 no>yes 0.1013499
## 5 m3_butanoic 0.06929307 0.0007623341 1000 yes>no 0.1013499
## 6 pentanoic 0.07049295 0.0009115844 1000 yes>no 0.1013499
## 7 hexanoic 0.44345565 0.0016915617 10000 yes>no 0.4434557
## Method
## 1 Permutation (per)
## 2 Permutation (per)
## 3 Permutation (per)
## 4 Permutation (per)
## 5 Permutation (per)
## 6 Permutation (per)
## 7 Permutation (per)
## Feature pval log2FC coverage ordering pval.adj
## 1 acetic 0.01829817 0.030019941 10000 yes>no 0.021347865
## 2 propanoic 0.00010000 0.073564594 10000 yes>no 0.000350000
## 3 m2_propanoic 0.00509949 -0.005387152 10000 no>yes 0.007139286
## 4 butanoic 0.00010000 -0.102207484 10000 no>yes 0.000350000
## 5 m3_butanoic 0.07109289 -0.003099900 10000 no>yes 0.071092891
## 6 pentanoic 0.00219978 -0.005363531 10000 no>yes 0.003849615
## 7 hexanoic 0.00029997 -0.007433509 10000 no>yes 0.000699930
## Method
## 1 Permutation (per)
## 2 Permutation (per)
## 3 Permutation (per)
## 4 Permutation (per)
## 5 Permutation (per)
## 6 Permutation (per)
## 7 Permutation (per)
## pval pval.adj Feature Method
## acetic 6.138004e-02 6.138004e-02 acetic Kruskal-Wallis (kru)
## propanoic 2.245807e-07 1.572065e-06 propanoic Kruskal-Wallis (kru)
## m2_propanoic 2.640232e-05 3.080271e-05 m2_propanoic Kruskal-Wallis (kru)
## butanoic 2.509298e-05 3.080271e-05 butanoic Kruskal-Wallis (kru)
## m3_butanoic 5.597385e-06 1.555731e-05 m3_butanoic Kruskal-Wallis (kru)
## pentanoic 6.667417e-06 1.555731e-05 pentanoic Kruskal-Wallis (kru)
## hexanoic 1.537043e-05 2.689826e-05 hexanoic Kruskal-Wallis (kru)
7.1.1.1 Conclusions
Significant impact from treatment type was detected: ANOVA: Propanoic, butanoic, and hexanoic PERMANOVA: PFOS = no impact; VAN = All but m3_butanoic Kruskal: all but acetic acid
7.1.1.2 PERMANOVA and PCoA (visualization)
# Scaling SCFA data
scaled.SCFA <- scale(dat.SCFA, center = FALSE, scale = TRUE)
# Calculating Bray-Curtis PCoA with capscale
tmp2 <- capscale(as.matrix(scaled.SCFA) ~ 1, data = as.matrix(scaled.SCFA), distance = "bray", metaMDS = TRUE)
# Collect data for plotting
mds.samples <- data.frame(tmp2$CA$u)
mds.scfa <- data.frame(tmp2$CA$v)
# Prepare point zero and labels for arrows
mds.scfa$label1 <- row.names(mds.scfa)
mds.scfa$x <- 0
mds.scfa$y <- 0
# Rename labels
mds.scfa <- mds.scfa %>% mutate("label2" = case_when(label1 == "acetic" ~ "Acetate",
label1 == "propanoic" ~ "Propionate",
label1 == "m2_propanoic" ~ "Isobutyrate",
label1 == "butanoic" ~ "Butyrate",
label1 == "m3_butanoic" ~ "Isovalerate",
label1 == "pentanoic" ~ "Valerate",
label1 == "hexanoic" ~ "Caproate"))
# Bind with main data
dat.mds <- cbind(dat.clean, mds.samples)
# Create plot
p <- ggplot() +
geom_point(data = dat.mds, mapping = aes(x = MDS1, y = MDS2, color = treatment)) +
stat_ellipse(data = dat.mds, mapping = aes(x = MDS1, y = MDS2, color = treatment, fill = treatment), geom = "polygon", alpha = 0.1) +
geom_segment(data = mds.scfa, mapping = aes(x=x, y=y, xend=0.35*MDS1, yend=0.35*MDS2),
lineend = "butt",
linejoin = "round",
size = 0.5,
arrow = arrow(length = unit(0.3, 'cm'))) +
geom_label_repel(data = mds.scfa,
mapping = aes(x = 0.35*MDS1, y = 0.35*MDS2), #
label = mds.scfa$label2,
size = 4,
min.segment.length = 0,
segment.alpha = 0.8,
box.padding = 0.3,
force = 1) +
theme_pubr(legend = "top") +
scale_color_manual(values = params$COL, name = "Treatment") +
scale_fill_manual(values = params$COL) +
labs(x = "Axis 1", y = "Axis 2") +
guides(fill = "none")## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p
# Remove legend
leg <- get_legend(p)
p2 <- p + theme(legend.position = "none")
# Add marginal boxplots
p3 <- ggExtra::ggMarginal(p = p2, type = "boxplot", size = 10, groupFill = TRUE)
p3# Add legend back to plot
p4 <- plot_grid(leg, p3, ncol = 1, rel_heights = c(0.1,1), rel_widths = 1)
p4# Save output
suppressMessages(ggsave(filename = "plots/animal_data/scfa/PCOA_weighed.png", plot = p3, device = "png", dpi = 300, height = 140, width = 140, units = "mm"))
suppressMessages(ggsave(filename = "plots/animal_data/scfa/PCOA_weighed.pdf", plot = p3, device = "pdf", dpi = 300, height = 140, width = 140, units = "mm"))######################################
# # BRAY-CURTIS
# dist.bray <- vegdist(as.matrix(dat.SCFA), method = "bray")
#
# # Ordination
# bray.pcoa <- pcoa(dist.bray)
#
#
# bray.df <- data.frame(pcoa1 = bray.pcoa$vectors[,1],
# pcoa2 = bray.pcoa$vectors[,2],
# pcoa3 = bray.pcoa$vectors[,3],
# pcoa4 = bray.pcoa$vectors[,4],
# pcoa5 = bray.pcoa$vectors[,5])
#
# # Add metadata
# dat.pcoa <- cbind(bray.df,
# rat_name = dat.clean$rat_name,
# treatment = dat.clean$treatment,
# pfos = dat.clean$pfos,
# van = dat.clean$van)
#
# # PERMANOVA
# adonis2(dist.bray ~ van*pfos, data = dat.clean)
#
# # Create PCoA plot
# p.pcoa <- ggplot(dat.pcoa, aes(x = pcoa1, y = pcoa2, color = treatment)) +
# geom_point() +
# theme_pubr(legend = "right") +
# stat_ellipse() +
# scale_color_manual(values = params$COL, name = "Treatment") #name = "Group", labels = c("No fibre","No fibre + PFOS", "Fibre","Fibre + PFOS")
# # scale_shape_manual(values = c(16,17), name = "Dissection day", labels = c("Day 8","Day 21"))
# p.pcoa
# p.pcoa2 <- p.pcoa +theme(legend.position = "none")
#
# # Recover legend
# leg <- get_legend(p.pcoa)
#
# # Add marginal boxplots
# p.pcoa3 <- ggExtra::ggMarginal(p = p.pcoa2, type = 'boxplot', size = 10, groupFill = TRUE)
# # Organize plot with legend
# p.pcoa4 <- plot_grid(leg, p.pcoa3, rel_widths = c(1,0.1))
# p.pcoa4
#
# # Save output
# suppressMessages(ggsave(filename = "plots/animal_data/scfa/PCOA.png", plot = p.pcoa3, device = "png", dpi = 300, height = 140, width = 140, units = "mm"))
# suppressMessages(ggsave(filename = "plots/animal_data/scfa/PCOA.pdf", plot = p.pcoa3, device = "pdf", dpi = 300, height = 140, width = 140, units = "mm"))7.2 Formic acid / Formate
7.2.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "formic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$formic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL formic 12 0.069 0.239
## 2 PFOS formic 12 0.244 0.484
## 3 VAN formic 12 0.12 0.283
## 4 VAN+PFOS formic 12 0.298 0.466
7.2.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp7.2.3 Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 6 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R08 8 no no 256. 265. 268. 274. 278. 283
## 2 PFOS R28 16 yes no 242. 248. 252. 265. 268. 273
## 3 PFOS R31 19 yes no 283. 284. 293. 301. 307. 315
## 4 PFOS R32 20 yes no 255. 262. 269. 276. 281 291
## 5 VAN R14 26 no yes 246. 256. 260. 267. 270. 274
## 6 VAN R17 29 no yes 268. 278. 274. 290. 295 295
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.732 0.0000000500
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.922 0.438
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
Formic acid data contains vary little data above Limit of Detection (= 0.6) with a total of 10 valid data points. There are no outliers, Shapiro-Wilk test shows no normality and Levene test shows equal variance. We use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment. ### Kruskal-Wallis test
7.2.3.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 formic 48 2.44 3 0.486 Kruskal-Wallis
7.2.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 formic 48 -0.0127 eta2[H] small
7.2.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 formic CTRL PFOS 12 12 0.986 0.324 0.648 ns
## 2 formic CTRL VAN 12 12 0.400 0.689 0.689 ns
## 3 formic CTRL VAN+PFOS 12 12 1.45 0.148 0.648 ns
## 4 formic PFOS VAN 12 12 -0.585 0.558 0.689 ns
## 5 formic PFOS VAN+PFOS 12 12 0.462 0.644 0.689 ns
## 6 formic VAN VAN+PFOS 12 12 1.05 0.295 0.648 ns
7.2.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM formate",limits = c(0,1.6),breaks = seq(0,1.6,0.5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.6, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p## Warning: Removed 22 rows containing missing values (`geom_point()`).
p.formic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.formic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 22 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 22 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.7.3 Acetic acid / Acetate
7.3.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "acetic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(OUTCOME) & !dat$rat_name == "R08")
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL acetic 11 7.84 1.23
## 2 PFOS acetic 12 7.19 2.49
## 3 VAN acetic 12 3.02 0.674
## 4 VAN+PFOS acetic 12 3.29 1.04
7.3.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp7.3.3 Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 1 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R10 10 no no 266. 273. 275. 285. 291. 294
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers, where one is extreme (R08). This outlier has been removed from the analysis.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.966 0.189
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 43 8.35 0.000174
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
Most data for this set is above Limit of Detection (= 0.5). This shows that SCFA concentration has two outliers, where one is extreme and has been removed from analysis. Shapiro-Wilk test show normality but the data has unequal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
7.3.4 Kruskal-Wallis test
7.3.4.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 acetic 47 31.4 3 0.000000692 Kruskal-Wallis
7.3.4.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 acetic 47 0.661 eta2[H] large
7.3.4.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 acetic CTRL PFOS 11 12 -0.478 0.633 0.743 ns
## 2 acetic CTRL VAN 11 12 -4.31 0.0000165 0.0000992 ****
## 3 acetic CTRL VAN+PFOS 11 12 -3.99 0.0000670 0.000181 ***
## 4 acetic PFOS VAN 12 12 -3.92 0.0000903 0.000181 ***
## 5 acetic PFOS VAN+PFOS 12 12 -3.59 0.000333 0.000500 ***
## 6 acetic VAN VAN+PFOS 12 12 0.328 0.743 0.743 ns
7.3.4.1 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
# Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM acetate",limits = c(0,15),breaks = seq(0,15,2)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = FALSE, y.position = c(14,15,12,13))
pp.acetic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.acetic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.7.4 Propanoic acid / Propionate
7.4.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "propanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$propanoic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL propanoic 12 0.268 0.113
## 2 PFOS propanoic 12 0.265 0.095
## 3 VAN propanoic 12 0.264 0.086
## 4 VAN+PFOS propanoic 12 0.325 0.102
7.4.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 6 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R08 8 no no 256. 265. 268. 274. 278. 283
## 2 PFOS R28 16 yes no 242. 248. 252. 265. 268. 273
## 3 VAN R15 27 no yes 268. 277. 283. 290. 296. 300
## 4 VAN R18 30 no yes 266. 275. 282. 285. 288. 298
## 5 VAN R22 34 no yes 292. 296. 301. 313. 311. 321
## 6 VAN+PFOS R43 43 yes yes 292. 301. 300. 313. 316. 322
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains five outliers, where none are extreme. As removing outliers does not affect final outcome they are left in the analysis.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.980 0.586
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.280 0.840
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that propanoic acid concentration has 5 outliers, Shapiro-Wilk test shows normality and Levene test shows equal variance. Therefore we will use one-way ANOVA test with Tukey’s honest significance test. ### ANOVA One-Way test
7.4.2.1 Perform test
If we had equality of variance we can now run a one-way ANOVA tests
anova_test() (if we have equal variance) or a
welch_anova_test() (if variance vary).
if(EQUAL.VAR) {
res.aov <- dat.clean %>% anova_test(FORMULA)
res.aov
} else {
res.aov <- dat.clean %>% welch_anova_test(FORMULA)
res.aov
}## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 treatment 3 44 1.068 0.372 0.068
7.4.2.2 Perform posthoc test
A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. When running relaxed Welch one-way test, the Games-Howell post hoc test or pairwise t-tests (with no assumption of equal variances) can be used to compare all possible combinations of group differences.
if(EQUAL.VAR) {
pwc <- dat.clean %>% tukey_hsd(FORMULA)
pwc
} else {
pwc <- dat.clean %>% games_howell_test(FORMULA)
pwc
}## # A tibble: 6 × 9
## term group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 treat… CTRL PFOS 0 -0.00295 -0.111 0.105 1 ns
## 2 treat… CTRL VAN 0 -0.00466 -0.113 0.104 0.999 ns
## 3 treat… CTRL VAN+P… 0 0.0566 -0.0516 0.165 0.508 ns
## 4 treat… PFOS VAN 0 -0.00171 -0.110 0.107 1 ns
## 5 treat… PFOS VAN+P… 0 0.0596 -0.0487 0.168 0.464 ns
## 6 treat… VAN VAN+P… 0 0.0613 -0.0470 0.170 0.44 ns
7.4.3 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM propionate",limits = c(0,0.6),breaks = seq(0,0.6,0.2)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.03, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p## Warning: Removed 1 rows containing missing values (`geom_point()`).
p.propanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.propanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.7.5 2-methyl-Propanoic acid / Isobutyrate
7.5.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "m2_propanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$m2_propanoic)) # & !dat$rat_name == "R01")
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL m2_propanoic 12 0.049 0.025
## 2 PFOS m2_propanoic 12 0.05 0.017
## 3 VAN m2_propanoic 12 0.005 0.018
## 4 VAN+PFOS m2_propanoic 12 0.01 0.027
7.5.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 5 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R01 1 no no 310 321. 325. 339. 350 354
## 2 CTRL R08 8 no no 256. 265. 268. 274. 278. 283
## 3 VAN R19 31 no yes 256. 263. 269 279. 282. 287
## 4 VAN+PFOS R42 42 yes yes 240. 244. 251. 260 264. 272
## 5 VAN+PFOS R47 47 yes yes 242. 249. 255. 263. 267. 271
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains one outliers, where one is extreme (R01). This outlier has been removed from the analysis -> leading to new outlier but not extreme.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.755 0.000000145
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.633 0.598
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that 2-methyl-propanoic acid concentration has five outliers of which four are extreme outlier - these arise from several samples = 0, and based on this they will be left in. Shapiro-Wilk test show normality but the data has unequal variance. Furthermore, very few samples above Limit of Detection (= 0.02) are observed in vancomycin treated samples. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
7.5.3 Kruskal-Wallis test
7.5.3.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 m2_propanoic 48 26.2 3 0.00000882 Kruskal-Wallis
7.5.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 m2_propanoic 48 0.526 eta2[H] large
7.5.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 m2_propanoic CTRL PFOS 12 12 0.0383 9.69e-1 9.69e-1 ns
## 2 m2_propanoic CTRL VAN 12 12 -3.73 1.94e-4 5.82e-4 ***
## 3 m2_propanoic CTRL VAN+PF… 12 12 -3.46 5.44e-4 8.15e-4 ***
## 4 m2_propanoic PFOS VAN 12 12 -3.76 1.67e-4 5.82e-4 ***
## 5 m2_propanoic PFOS VAN+PF… 12 12 -3.50 4.71e-4 8.15e-4 ***
## 6 m2_propanoic VAN VAN+PF… 12 12 0.268 7.88e-1 9.46e-1 ns
7.5.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM isobutyrate",limits = c(0,0.11),breaks = seq(0,0.11,0.02)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.02, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.102,0.11,0.096,0.088))
p## Warning: Removed 13 rows containing missing values (`geom_point()`).
p.m2p <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.m2p,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 13 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 13 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.7.6 Butanoic acid / Butyrate
7.6.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "butanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$butanoic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL butanoic 12 1.28 0.655
## 2 PFOS butanoic 12 1.14 0.625
## 3 VAN butanoic 12 0.187 0.195
## 4 VAN+PFOS butanoic 12 0.193 0.109
7.6.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN R20 32 no yes 285 293. 301. 310. 317. 321
## 2 VAN+PFOS R48 48 yes yes 224. 229. 234. 239. 242. 250
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers, where one is extreme (R20). This outlier does not affect the final results or type of analysis and has been left in.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.938 0.0130
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 7.69 0.000309
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
Most data for this set is above Limit of Detection (= 0.03). This shows that butanioc acid concentration has two outliers, where one is extreme and has been removed from analysis. Shapiro-Wilk test shows no normality and Levene test shows unequal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
7.6.3 Kruskal-Wallis test
7.6.3.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 butanoic 48 27.4 3 0.0000048 Kruskal-Wallis
7.6.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 butanoic 48 0.555 eta2[H] large
7.6.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 butanoic CTRL PFOS 12 12 -0.0729 0.942 0.942 ns
## 2 butanoic CTRL VAN 12 12 -3.95 0.0000777 0.000315 ***
## 3 butanoic CTRL VAN+PFOS 12 12 -3.50 0.000467 0.000918 ***
## 4 butanoic PFOS VAN 12 12 -3.88 0.000105 0.000315 ***
## 5 butanoic PFOS VAN+PFOS 12 12 -3.43 0.000612 0.000918 ***
## 6 butanoic VAN VAN+PFOS 12 12 0.452 0.651 0.782 ns
7.6.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM butyrate",limits = c(0,3),breaks = seq(0,3,0.5)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.03, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(2.6,3,2.4,2.8))
p## Warning: Removed 1 rows containing missing values (`geom_point()`).
p.butanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.butanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 1 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.7.7 3-methyl-Butanoic acid / Isovalerate
7.7.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "m3_butanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$m3_butanoic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL m3_butanoic 12 0.03 0.017
## 2 PFOS m3_butanoic 12 0.033 0.012
## 3 VAN m3_butanoic 12 0.005 0.018
## 4 VAN+PFOS m3_butanoic 12 0.006 0.022
7.7.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 3 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R01 1 no no 310 321. 325. 339. 350 354
## 2 VAN R19 31 no yes 256. 263. 269 279. 282. 287
## 3 VAN+PFOS R42 42 yes yes 240. 244. 251. 260 264. 272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains three outliers arising from several data points being below Limit of detection. Furthermore removing extreme outliers does not affect the result or analysis - these have therefore been left in.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.678 0.00000000551
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.391 0.760
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that 3-methyl-butanoic acid concentration has three outliers, which has been left in. Shapiro-Wilk test show no normality but the data has equal variance. Furthermore, very few samples above Limit of Detection (= 0.02) are observed in vancomycin treated samples. We use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
7.7.3 Kruskal-Wallis test
7.7.3.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 m3_butanoic 48 25.6 3 0.0000117 Kruskal-Wallis
7.7.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 m3_butanoic 48 0.513 eta2[H] large
7.7.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 m3_butanoic CTRL PFOS 12 12 0.556 5.78e-1 6.94e-1 ns
## 2 m3_butanoic CTRL VAN 12 12 -3.29 9.96e-4 1.67e-3 **
## 3 m3_butanoic CTRL VAN+PFOS 12 12 -3.26 1.11e-3 1.67e-3 **
## 4 m3_butanoic PFOS VAN 12 12 -3.85 1.19e-4 4.05e-4 ***
## 5 m3_butanoic PFOS VAN+PFOS 12 12 -3.82 1.35e-4 4.05e-4 ***
## 6 m3_butanoic VAN VAN+PFOS 12 12 0.0309 9.75e-1 9.75e-1 ns
7.7.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM isovalerate",limits = c(0,0.1),breaks = seq(0,0.1,0.02)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.02, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.092,0.1,0.084,0.076))
p## Warning: Removed 14 rows containing missing values (`geom_point()`).
p.m3b <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.m3b,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 14 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 14 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.7.8 Pentanoic acid / Valerate
7.8.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "pentanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$pentanoic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL pentanoic 12 0.047 0.023
## 2 PFOS pentanoic 12 0.048 0.025
## 3 VAN pentanoic 12 0.004 0.015
## 4 VAN+PFOS pentanoic 12 0.007 0.026
7.8.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 3 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS R26 14 yes no 238. 246. 248 257 259. 265
## 2 VAN R19 31 no yes 256. 263. 269 279. 282. 287
## 3 VAN+PFOS R42 42 yes yes 240. 244. 251. 260 264. 272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains three outliers arising from several data points being below Limit of detection. Furthermore removing extreme outliers does not affect the result or analysis - these have therefore been left in.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.820 0.00000372
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 1.72 0.177
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that pentanoic acid concentration has one outlier, which has been left in for the analysis. Shapiro-Wilk test shows no normality and Levene test shows equal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
7.8.3 Kruskal-Wallis test
7.8.3.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 pentanoic 48 27.3 3 0.00000509 Kruskal-Wallis
7.8.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 pentanoic 48 0.552 eta2[H] large
7.8.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 pentanoic CTRL PFOS 12 12 0.0155 0.988 0.988 ns
## 2 pentanoic CTRL VAN 12 12 -3.76 0.000173 0.000448 ***
## 3 pentanoic CTRL VAN+PFOS 12 12 -3.62 0.000299 0.000448 ***
## 4 pentanoic PFOS VAN 12 12 -3.77 0.000163 0.000448 ***
## 5 pentanoic PFOS VAN+PFOS 12 12 -3.63 0.000282 0.000448 ***
## 6 pentanoic VAN VAN+PFOS 12 12 0.139 0.889 0.988 ns
7.8.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM valerate",limits = c(0,0.11),breaks = seq(0,0.11,0.02)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.01, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.086,0.094,0.11,0.102))
p## Warning: Removed 14 rows containing missing values (`geom_point()`).
p.pentanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.pentanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 14 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 14 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.7.9 4-methyl-Pentanoic acid / Isocaproate
7.9.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "m4_pentanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$m4_pentanoic))# & !dat$rat_name %in% c("R42","R45"))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL m4_pentanoic 12 0.018 0.023
## 2 PFOS m4_pentanoic 12 0.03 0.058
## 3 VAN m4_pentanoic 12 0.035 0.074
## 4 VAN+PFOS m4_pentanoic 12 0.216 0.513
7.9.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
#### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 5 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 PFOS R25 13 yes no 339. 340. 353. 364. 348. 358
## 2 PFOS R28 16 yes no 242. 248. 252. 265. 268. 273
## 3 VAN R19 31 no yes 256. 263. 269 279. 282. 287
## 4 VAN+PFOS R42 42 yes yes 240. 244. 251. 260 264. 272
## 5 VAN+PFOS R45 45 yes yes 234. 239. 244. 253. 262. 263
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains five outliers, however, these outliers arise several data points being under Limit of Detection (=0.03). These data points are therefore kept.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.461 4.94e-12
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 1.62 0.198
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that 4-methyl-pentanoic acid concentration has five outliers and only few data points are above Limit of Detection (= 0.03) - outliers are therefore kept. Shapiro-Wilk test show no normality but the data has equal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
7.9.3 Kruskal-Wallis test
7.9.3.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 m4_pentanoic 48 1.32 3 0.724 Kruskal-Wallis
7.9.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 m4_pentanoic 48 -0.0381 eta2[H] small
7.9.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 m4_pentanoic CTRL PFOS 12 12 -0.165 0.869 0.987 ns
## 2 m4_pentanoic CTRL VAN 12 12 0.0165 0.987 0.987 ns
## 3 m4_pentanoic CTRL VAN+PFOS 12 12 0.875 0.381 0.781 ns
## 4 m4_pentanoic PFOS VAN 12 12 0.182 0.856 0.987 ns
## 5 m4_pentanoic PFOS VAN+PFOS 12 12 1.04 0.298 0.781 ns
## 6 m4_pentanoic VAN VAN+PFOS 12 12 0.859 0.391 0.781 ns
7.9.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM isocaproate",limits = c(0,1.78),breaks = seq(0,1.78,0.2)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.03, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE)
p## Warning: Removed 14 rows containing missing values (`geom_point()`).
p.m4p <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.m4p,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 14 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 14 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.7.10 Hexanoic acid / Caproate
7.10.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "hexanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$hexanoic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL hexanoic 12 0.053 0.041
## 2 PFOS hexanoic 12 0.07 0.058
## 3 VAN hexanoic 12 0.004 0.014
## 4 VAN+PFOS hexanoic 12 0.006 0.021
7.10.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 2 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 VAN R19 31 no yes 256. 263. 269 279. 282. 287
## 2 VAN+PFOS R42 42 yes yes 240. 244. 251. 260 264. 272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Data contains two outliers, both arising due to majority of data points in the vancomycin treated groups are below Limit of Detection (=0.01). Therefore these outliers are left in.
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.874 0.0000992
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 6.86 0.000687
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that hexanoic acid concentration has two outliers and several datapoint below Limit of Detection. Shapiro-Wilk test shows no normality and Levene test shows unequal variance. Therefore we use a non-parametric Kruskal-Wallis test with Dunn’s p-value adjustment.
7.10.3 Kruskal-Wallis test
7.10.3.0.1 Perform test
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 hexanoic 48 28.0 3 0.00000356 Kruskal-Wallis
7.10.3.0.2 Effect size
The eta squared, based on the H-statistic, can be used as the measure of the Kruskal-Wallis test effect size. The interpretation values commonly in published literature are: 0.01- < 0.06 (small effect), 0.06 - < 0.14 (moderate effect) and >= 0.14 (large effect).
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 hexanoic 48 0.569 eta2[H] large
7.10.3.0.3 Post-hoc test if interaction is significant
A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. It’s also possible to use the Wilcoxon’s test to calculate pairwise comparisons between group levels with corrections for multiple testing.
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 hexanoic CTRL PFOS 12 12 0.717 0.473 0.568 ns
## 2 hexanoic CTRL VAN 12 12 -3.39 0.000699 0.00139 **
## 3 hexanoic CTRL VAN+PFOS 12 12 -3.31 0.000927 0.00139 **
## 4 hexanoic PFOS VAN 12 12 -4.11 0.0000401 0.000168 ***
## 5 hexanoic PFOS VAN+PFOS 12 12 -4.03 0.0000560 0.000168 ***
## 6 hexanoic VAN VAN+PFOS 12 12 0.0779 0.938 0.938 ns
7.10.4 Create figure
## Prepare statistical information:
pwc.adj <- pwc %>%
add_x_position(x = PREDICTOR) %>%
p_format("p.adj", accuracy = 0.0001, trailing.zero = TRUE, new.col = TRUE)
# Format for ggplot
if (sum(pwc.adj$p.adj.signif != "ns") == 0) {
stat.sig <- pwc.adj %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
} else {
stat.sig <- pwc.adj[pwc.adj$p.adj.signif != "ns",] %>%
add_y_position(step.increase = 0.25) %>%
mutate(y.position = seq(min(y.position), max(y.position),length.out = n()))
}
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM caproate",limits = c(0,0.22),breaks = seq(0,0.22,0.05)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.01, linetype = "dashed", color = "#2f2f2f")
p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.205,0.22,0.175,0.19))
p## Warning: Removed 15 rows containing missing values (`geom_point()`).
p.hexanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.hexanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 15 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 15 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.7.11 Heptanoic acid / Enanthate
7.11.1 Prepare data
# load data
load("R_objects/animal_data.Rdata")
params <- readRDS("R_objects/animal_params.RDS")
# Set names of variables
PREDICTOR <- "treatment"
OUTCOME <- "heptanoic" #c("acetic","formic","propanoic","m2_propanoic","butanoic","m3_butanoic","pentanoic","m4_pentanoic","hexanoic","heptanoic")
SUBJECT <- "rat_name"
# Remove NA in the data column
dat.clean <- subset(dat, !is.na(dat$heptanoic))
# Create formula
PREDICTOR.F <- ifelse(length(PREDICTOR) > 1, paste(PREDICTOR, collapse = "*"), PREDICTOR)
FORMULA <- as.formula(paste(OUTCOME,PREDICTOR.F, sep = " ~ "))
# Summary samples in groups
dat.clean %>% group_by(across(all_of(PREDICTOR))) %>% get_summary_stats(!!sym(OUTCOME), type = "mean_sd")## # A tibble: 4 × 5
## treatment variable n mean sd
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CTRL heptanoic 12 0.004 0.012
## 2 PFOS heptanoic 12 0 0
## 3 VAN heptanoic 12 0.009 0.033
## 4 VAN+PFOS heptanoic 12 0.008 0.028
7.11.2 Visualise
Create a boxplot of the data.
# Create plot
bxp <- dat.clean %>%
ggboxplot(x = if_else(length(PREDICTOR) > 1, PREDICTOR[2],PREDICTOR[1]),
y = OUTCOME,
color = PREDICTOR[1],
facet.by = if(length(PREDICTOR) == 3) PREDICTOR[3],
palette = params$COL)
bxp
### Assumptions and preliminary tests
The ANOVA tests assume the following characteristics about the data:
Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group.
This is already done for the whole projectNo significant outliers in the two groups
Normality. the data for each group should be approximately normally distributed.
Homogeneity of variances. the variance of the outcome variable should be equal in each group.
In this section, we’ll perform some preliminary tests to check whether these assumptions are met.
Identify outliers
Outliers can be easily identified using boxplot methods, implemented in
the R function identify_outliers() [rstatix package].
# Test for outliers
dat.clean %>%
group_by(across(all_of(PREDICTOR))) %>%
identify_outliers(!!sym(OUTCOME))## # A tibble: 3 × 49
## treatment rat_name ordering pfos van bw_0 bw_1 bw_2 bw_3 bw_4 bw_5
## <chr> <chr> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 CTRL R10 10 no no 266. 273. 275. 285. 291. 294
## 2 VAN R19 31 no yes 256. 263. 269 279. 282. 287
## 3 VAN+PFOS R42 42 yes yes 240. 244. 251. 260 264. 272
## # ℹ 38 more variables: bw_6 <int>, bw_7 <dbl>, bw_8 <int>, bw_gain <dbl>,
## # cecum_wt <dbl>, cecum_wt_bw <dbl>, cecum_norm <dbl>, liver_wt <dbl>,
## # liver_wt_bw <dbl>, liver_norm <dbl>, tot_pfos4 <dbl>, blood_vol4_mL <dbl>,
## # pfos_serum4_ugml <dbl>, pfos_serum4_ug <dbl>, pfos_serum4_mg <dbl>,
## # pfos_serum4_pct <dbl>, tot_pfos8 <dbl>, blood_vol8_mL <dbl>,
## # pfos_serum8_ugml <dbl>, pfos_serum8_ug <dbl>, pfos_serum8_mg <dbl>,
## # pfos_serum8_pct <dbl>, pfos_change48_pct <dbl>, pfos_liver_ugg <dbl>, …
Check normality
QQ plot and Shapiro-Wilk test of normality are used to analyze the model
residuals.
# Build the linear model
model <- lm(FORMULA, data = dat.clean)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 × 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.399 9.66e-13
Test homogneity of variance assumption
1. The residuals versus fits plot can be used to check the homogeneity
of variances.
- It’s also possible to use the Levene’s test to check the homogeneity of variances:
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 3 44 0.446 0.722
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
This shows that heptanoic acid concentration only three data points above Limit of Detection (= 0.03) which is deemed too low for analysis. No final analysis therefore made.
7.11.3 Create figure
#Create plot
p <- ggboxplot(dat.clean, x = PREDICTOR, y = OUTCOME,
fill = PREDICTOR,
add = "jitter",
add.params = list(size = 1)) +
scale_fill_manual(values = params$COL) +
scale_y_continuous(name = "mM enanthate",limits = c(0,0.22),breaks = seq(0,0.22,0.05)) +
labs(fill = "Treatment") +
scale_x_discrete(name = "Treatment") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_hline(yintercept = 0.01, linetype = "dashed", color = "#2f2f2f")
#p <- p + stat_pvalue_manual(stat.sig, label = "p.adj.format",tip.length = 0, hide.ns = TRUE, y.position = c(0.205,0.22,0.175,0.19))
p## Warning: Removed 25 rows containing missing values (`geom_point()`).
p.heptanoic <- p
if (!file.exists("R_objects/scfa")) dir.create(file.path(getwd(), "R_objects/scfa"))
save(p.heptanoic,file = paste0("R_objects/scfa/scfa_",OUTCOME,".rdata"))
# Plot for saving without legend
p2 <- p + theme(legend.position = "none")
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.png"), p2, device = "png", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 25 rows containing missing values (`geom_point()`).
ggsave(filename = paste0("plots/animal_data/scfa/scfa_",OUTCOME,"_plot.pdf"), p2, device = "pdf", dpi = 300, units = "mm", width = 100, height = 100)## Warning: Removed 25 rows containing missing values (`geom_point()`).
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.7.12 SCFA ggarrange
Here all plots from the SCFA analysis is combined into one figure.
params <- readRDS("R_objects/animal_params.RDS")
# Load rdata files with scfa plots
pfiles <- list.files(path = "R_objects/scfa/", pattern = "*.rdata", full.names = TRUE)
lapply(pfiles, load,.GlobalEnv)## [[1]]
## [1] "p.acetic"
##
## [[2]]
## [1] "p.butanoic"
##
## [[3]]
## [1] "p.formic"
##
## [[4]]
## [1] "p.heptanoic"
##
## [[5]]
## [1] "p.hexanoic"
##
## [[6]]
## [1] "p.m2p"
##
## [[7]]
## [1] "p.m3b"
##
## [[8]]
## [1] "p.m4p"
##
## [[9]]
## [1] "p.pentanoic"
##
## [[10]]
## [1] "p.propanoic"
# Create plot
p.all <- ggarrange(p.formic,p.acetic,p.propanoic,p.m2p,p.butanoic,p.m3b,p.pentanoic,p.m4p,p.hexanoic,p.heptanoic,
ncol = 5, nrow = 2,
common.legend = TRUE,
legend = "top",
label.x = 0,
font.label = list(size = 24, face = "bold"),
labels = c("A","B","C","D","E","F","G","H","I","J"),
align = "hv")## Warning: Removed 22 rows containing missing values (`geom_point()`).
## Removed 22 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 13 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 14 rows containing missing values (`geom_point()`).
## Removed 14 rows containing missing values (`geom_point()`).
## Removed 14 rows containing missing values (`geom_point()`).
## Warning: Removed 15 rows containing missing values (`geom_point()`).
## Warning: Removed 25 rows containing missing values (`geom_point()`).
# Save graphics
ggsave(filename = "plots/animal_data/scfa/all.png", p.all, device = "png", dpi = 300, height = 200, width = 400, units = "mm")
ggsave(filename = "plots/animal_data/scfa/all.pdf", p.all, device = "pdf", dpi = 300, height = 200, width = 400, units = "mm")
# clear the environment and release memory
rm(list = ls(all.names = TRUE)) #will clear all objects includes hidden objects.
invisible(gc()) #free up memory and report the memory usage.8 SETTINGS
Overview of the parameters and packages that were used for this analysis
8.1 PARAMETERS
The following paramenters were set in for this analysis:
8.2 PACKAGES
The analysis was run in R version 4.2.2 using the following packages:
pack <- data.frame(Package = (.packages()))
for (i in seq(nrow(pack))) pack$Version[i] <- as.character(packageVersion(pack$Package[i]))
kbl(pack[order(pack$Package),], row.names = F) %>% kable_classic(lightable_options = "striped") | Package | Version |
|---|---|
| ape | 5.7.1 |
| base | 4.2.2 |
| cowplot | 1.1.1 |
| datasets | 4.2.2 |
| DAtest | 2.8.0 |
| decontam | 1.18.0 |
| dplyr | 1.1.0 |
| forcats | 1.0.0 |
| ggbreak | 0.1.1 |
| ggplot2 | 3.4.2 |
| ggpubr | 0.6.0 |
| ggrepel | 0.9.3 |
| graphics | 4.2.2 |
| grDevices | 4.2.2 |
| kableExtra | 1.3.4 |
| lattice | 0.20.45 |
| lubridate | 1.9.2 |
| methods | 4.2.2 |
| pals | 1.7 |
| permute | 0.9.7 |
| phangorn | 2.11.1 |
| pheatmap | 1.0.12 |
| phyloseq | 1.42.0 |
| plotly | 4.10.1 |
| purrr | 1.0.1 |
| readr | 2.1.4 |
| rstatix | 0.7.2 |
| stats | 4.2.2 |
| stringr | 1.5.0 |
| tibble | 3.1.8 |
| tidyr | 1.3.0 |
| tidyverse | 2.0.0 |
| utils | 4.2.2 |
| vegan | 2.6.4 |